next up previous contents
Next: Media Up: The Validation of Speech Previous: Documentation   Contents

Automatic Validation of Data

This working step includes all checks on the corpus data that can be carried out automatically or require some technical background knowledge. Typically this will be done by one person with good programming skills and in parallel to the task described in the next chapter.

The following checklist contains probably more checks than necessary for your particular corpus. If you are sure that a check does not apply for your corpus, simply skip it. On the other hand try to think about checks that might be not included in the following checklist.

In some cases we have included sample scripts written in CSH running under Linux which is fairly readable like a pseudo-code. You can easily transform the code snippets into your preferred script language. We recommend using Perl as a scripting language, but if you love to hack Java, do whatever is fun for you.

Report all performed checks and their findings in the validation report. Describe exactly the testing method and the formulas for resulting numbers, so that the client/producer may reproduce the results if necessary. You may even include the used programs or scripts in the appendix of your report.



Subsections
next up previous contents
Next: Media Up: The Validation of Speech Previous: Documentation   Contents
Angela Baumann 2004-06-03