Next: Quality Control Up: Collection Previous: Ongoing Documentation, Logging Contents

Pre-Validation

In a large speech corpus collection it is highly recommended that you perform a pre-validation after a small amount of collected data, preferable conducted by an external validation center. Do not confuse the terms pre-test (section

) and pre-validation: the pre-test is only concerned with the testing of the technical setup and procedures; the pre-validation deals with real data that will be part of the resulting speech corpus.

The optimal model for a pre-validation is that after a pre-defined number of recorded speakers the speech signals, the annotations, the meta data and documentation files are transferred to an external validation center which will perform a formal validation of the data. The collection awaits the results of this validation, then reacts to found errors or other recommendations and then continues.

In practice there will often be some restrictions on this ideal situation: in most cases the annotation files won't be ready after such a short collection time, and the same is probably true for documentation and meta data files. Nevertheless, at least the speech signals should be validated against their specifications.

Next: Quality Control Up: Collection Previous: Ongoing Documentation, Logging Contents

BITS Projekt-Account 2004-06-01