next up previous contents
Next: Selection of Validation Data Up: Manual Validation Previous: Manual Validation   Contents


Manual Validation Contents

Refer to your validation check list (see chapter 3) for the items of the corpus that have to be validated manually. Sometimes the exact contents to be validated are not given in the contract. Here are some typical contents that are usually subject of a manual validation:
Transcript: Spelling based on a standard reference, use of capital letters, mismatches with spelling used in the prompt text / lexicon / annotation files, mismatch to the recording, wrong usage of markers etc. Labeling/Tagging: Wrong usage of labels, extra or missing labels. Segmentation: Deviation of segment boundaries / points in time of more than a defined threshold. Lexicon: Spelling based on a standard reference, use of capital letters, wrong canonical pronunciation as given in a standard reference.6.1

Meta data: wrong sex of speaker, wrong dialect class (difficult).
Before you get started with the manual validation set up a list of possible errors being checked for and document these in the validation protocol.



Angela Baumann 2004-06-03