Next: Data Model
Up: Annotation
Previous: Annotation
Contents
The following list of annotations is taken from the documentation of the
BAS Partitur Format8.2and will give you an idea of what different types of annotation might be
used and what has already been done so far. Pure transcriptions or tagging are
marked with an (T), while segmentations and labellings are marked with an (S):
- Orthographic transcript (T)
- Canonical pronunciation (citation form) (T)
- Broad phonemic/phonetic segmentation and labeling (S)
- Word segmentation (S)
- Dialog act labeling (T)
- Syntactic-prosodic labeling (T)
- Prosodic labeling and segmentation in Tobi (S)
- Phonetic segmentation and labeling in IPA (S)
- Noises: articulatory and technical (S)
- Segmentation or tagging of cross talk (T/S)
- Parts-of-Speech (T)
- Syntax trees (T)
- Translations (T)
- Turn segmentation (S)
- Prosodic labeling of accents and boundary types (S)
- User state segmentation and labeling (S)
- Meta-linguistic events: breathing,
laughing, cough, hesitations. (S)
- Discourse events: false starts, stutter, repeats etc. (T)
- Glottal pulses (S)
Note that a transcript contains no information about the time relation of
its contents aside from the fact that usually a chunk of speech is
associated to a chunk of transcript. For example, if the corpus is
structured in paragraphs of read text, then each signal file stores the
speech of one paragraph while the associated transcription file stores the
transcript of what was said in the signal file, but there is no
fine-grain time information about when each individual word starts and
ends within the signal file.
A segmentation requires either
- a point in time or
- a starting time and ending time or
- a starting time and duration
of the labelled category. For example, in a phonemic segmentation and labelling
each segment will consist of the phoneme category (coded for instance in
SAM-PA), the begin of the phonemic segment and the duration:
IPA: 1.2758934 0.097867 e:
Next: Data Model
Up: Annotation
Previous: Annotation
Contents
BITS Projekt-Account
2004-06-01