Next: Terms and Definitions
Up: Introduction
Previous: Intended audience
Contents
The cookbook consists of three parts: the first part General
contains topics of general interest in the context of speech corpora that
are better discussed outside the context of the practical cookbook. The
fast reader might skip this part and go directly to the
second part Speech Corpus Production which lists the major steps
of a typical corpus production in chronological order.
The main phases described there are:
- Specification,
- Preparation of Collection,
- Collection, in most cases overlapped by
- Post-processing,
- Annotation,
- Documentation, and optionally
- Validation.
Throughout Part II you will find check lists at the end of the
major chapters. They are intended to be used during
actual speech corpus production.
Check points marked with
a single star (*) denote compulsory steps (minimal requirements); check
points marked with more than one star denote additional and recommended
working steps that will increase the (re-)usability and value of the
resulting speech corpus, but also require a greater effort in terms of
time and money. The working steps themselves are abbreviated to mere key
words. If you are not familiar with the meaning of a working step listed
on a check list, please refer to the page number(s) in brackets after the
keyword to find the passage(s) with a detailed description of the topic.
For example:
Specification
...
* Define number of sessions (p. 35)
* Define number of prompts / recording time (p. 35)
** Define distribution of sex (p. 35)
*** Define distribution of age (p. 36)
*** Define distribution of dialects / place of living / place of
education (p. 36)
* Define sampling rate (p. 35)
* Define bits per sample (p. 35)
* Define microphone(s) (p. 35)
* Define acoustical environment (p. 39)
...
In this example all but the third to fifth check box are required for a
corpus specification. Not all corpora require a defined distribution of
gender; the same is true for age and origin of the speakers.
Such a defined distribution will increase the
usability of the corpus but will at the same time make the recruiting
process more costly.
All check lists are collected in appendix
in a format suitable for copying.
Finally, Part III Examples contains three prototypical
speech corpus examples (WebCommand, SpeechDat-II German, and SmartKom)
together with their key specifications and a list of references.
Next: Terms and Definitions
Up: Introduction
Previous: Intended audience
Contents
BITS Projekt-Account
2004-06-01