Next: Character code checks
Up: Automatic Validation of Data
Previous: Annotation, meta data and
Contents
Annotation and lexicon contents
If not already done in the previous steps5.4,
write a simple script to extract labels from the annotation files and
check them for inconsistencies.
- Cross-check the found labels with the
documentation of the labeling. Are all found labels documented? Are
there any documented labels not found in the annotations?
- Report any digits or numerals that are not written in their full
orthographic form.
- Report any punctuation used in the annotations. There shouldn't be
any except in cases where they are separated from other items by
white space and have a special meaning (for instance prosodic).
- Report any words that are written with an initial capital because
they are at the beginning of a sentence.
- Cross-check all words extracted from the transcripts with the
spelling in the orthographic part of the lexicon.
Also you might check the timing information
in label files for overlapping segments or gaps between segments, if this
should not happen according to your reference.
Next: Character code checks
Up: Automatic Validation of Data
Previous: Annotation, meta data and
Contents
Angela Baumann
2004-06-03