Next: General Rules for Transcription Up: Annotation Previous: Data Model Contents

Orthographic Transcription

The most basic type of annotation that makes a collection of speech recordings into a speech corpus is some kind of orthographic transcription. This can range from a simple chain of words per recording item (based for instance on the script that was used during the recording) to an extensive labeling of several different semantic layers^8.5. The choice about what is to be included in the transcript is dependent on the type of speech corpus and the intended usage. For example, a corpus of read speech items over the telephone network with the aim to train automatic speech recognition algorithms does not need any elaborated labeling of discourse events. A corpus containing dialogue speech between two or more persons that is subject to scientific investigations will require much more effort.

Subsections

General Rules for Transcription
Possible Transcript Items
Transcription Example
Transcription Method
Existing Transcription Formats
Transcription Tools

BITS Projekt-Account 2004-06-01