Next: Intended audience
Up: Introduction
Previous: Introduction
Contents
Summary
This document is the result of a study
conducted within the German BITS project in 2002.
BITS1.1is an acronym for BAS1.2:
Infrastructures for Technical Speech Processing and is a 100% publicly
funded project devoted to the improvement
of the infrastructural situation in Spoken Language Processing (SLP)
of the German language.
One of the sub-projects of BITS aims to come up with a
cookbook-like document on the topic of Speech Corpora Validation.
Speech Corpus in the scope of this document means a collection
of digital recordings of speech created with the aim of exploring
the functioning of speech communication, often with respect to certain
technical
applications like Automatic Speech Recognition (ASR), Speech
Synthesis or
Speaker Verification etc.
The term Validation refers to a process that
analyses and documents either
a completed speech
corpus or a speech corpus that is in the process of being produced with
regard to its specifications.
Speech Corpus Validation has several important applications in the
field of Spoken Language Processing (SLP):
- Quality control: Validation is carried out during or in the
last phase of
the production of a new speech corpus, either
- by the producer (inhouse validation) or
- by an independent validation organization (external
validation)
to ensure certain levels of quality.
- Controlling: Validation is carried out by the buyer of a speech
corpus to ensure that the speech corpus does meet his/her needs.
- Improvement: By validation of existing speech corpora, these
corpora may be improved for future re-use.
- Comparability: Validation carried out under certain
standardized guidelines might lead to a quality grade that simplifies
the selection between different existing speech corpora of similar
specifications for a certain task.
This document is a cookbook for speech corpus validation. It is the result of
the validation experiences gained at the Bavarian Archive for Speech Signals
(BAS)1.3 in numerous corpus collections.
Next: Intended audience
Up: Introduction
Previous: Introduction
Contents
Angela Baumann
2004-06-03