Transcript: Spelling based on a standard reference, use of capital letters, mismatches with spelling used in the prompt text / lexicon / annotation files, mismatch to the recording, wrong usage of markers etc. Labeling/Tagging: Wrong usage of labels, extra or missing labels. Segmentation: Deviation of segment boundaries / points in time of more than a defined threshold. Lexicon: Spelling based on a standard reference, use of capital letters, wrong canonical pronunciation as given in a standard reference.6.1
Meta data: wrong sex of speaker, wrong dialect class (difficult).Before you get started with the manual validation set up a list of possible errors being checked for and document these in the validation protocol.