Next: Existing Transcription Formats
Up: Orthographic Transcription
Previous: Transcription Example
Contents
Transcription Method
As anyone can imagine a transcriber can make many mistakes, especially
in complex transcription formats. To simplify
the transcription process and to end up
with a formally correct transcription some measures have to be taken:
- Train the transcribers extensively and test their performance from
time to time on
bench-mark examples. Use a fixed manual for the transcription and stick to
it throughout the work on the corpus.
- Use a text editor that allows `hot keys' for marker strings and
a simple online parse of the input and that
displays the various marker types in different colors; for instance use
xemacs or WWWTranscribe (see next section).
- Use a simple re-play tool to allow the transcriber to listen to the
sound channels quickly and easily. In longer recording files (more than 5
sec) the tool should allow you to mark and re-play parts of the signal as with
a sound editor. Be sure to use a tool that is not capable of modifying the
signal, or protect your signal files by setting their rights to read-only.
- In complex transcriptions use a structured process:
- On the first level produce a simple
transcript with only the lexical items together with their immediate
markers (numbers, names, spellings, neologisms,
foreign words, hard to identify, truncations). This base
transcription may also be used for a first rough usage of the recorded
speech data, in cases where a partner or client is not willing to
wait for the final
transcription.
- On the second level let a different group of labelers add the
more complex markers (off-talk, lengthening, interruptions, comments
on pronunciation, repetition/correction, false starts, breathing, filled
and empty pauses, noises, crosstalk, superpositions, prosody).
- Finally pass the transcriptions through a correction level where
all data are reviewed by a small number of experts (preferably by one
person only).
- Use technical verification techniques after the final or after
each processing level:
- Extract lexical items and compare them to a `valid word list' to
detect typos or inconsistent spellings.
- Run the data through a formal parser to detect syntactical
errors.
Next: Existing Transcription Formats
Up: Orthographic Transcription
Previous: Transcription Example
Contents
BITS Projekt-Account
2004-06-01