next up previous contents
Next: Terms and Definitions Up: Introduction Previous: Intended audience   Contents

Overview

The cookbook consists of three parts: the first part General contains topics of general interest in the context of speech corpora that are better discussed outside the context of the practical cookbook. The fast reader might skip this part and go directly to the second part Speech Corpus Production which lists the major steps of a typical corpus production in chronological order. The main phases described there are: Throughout Part II you will find check lists at the end of the major chapters. They are intended to be used during actual speech corpus production. Check points marked with a single star (*) denote compulsory steps (minimal requirements); check points marked with more than one star denote additional and recommended working steps that will increase the (re-)usability and value of the resulting speech corpus, but also require a greater effort in terms of time and money. The working steps themselves are abbreviated to mere key words. If you are not familiar with the meaning of a working step listed on a check list, please refer to the page number(s) in brackets after the keyword to find the passage(s) with a detailed description of the topic.

For example:

Specification
...
$\bigcirc$ * Define number of sessions (p. 35)
$\bigcirc$ * Define number of prompts / recording time (p. 35)
$\bigcirc$ ** Define distribution of sex (p. 35)
$\bigcirc$ *** Define distribution of age (p. 36)
$\bigcirc$ *** Define distribution of dialects / place of living / place of education (p. 36)
$\bigcirc$ * Define sampling rate (p. 35)
$\bigcirc$ * Define bits per sample (p. 35)
$\bigcirc$ * Define microphone(s) (p. 35)
$\bigcirc$ * Define acoustical environment (p. 39)
...
In this example all but the third to fifth check box are required for a corpus specification. Not all corpora require a defined distribution of gender; the same is true for age and origin of the speakers. Such a defined distribution will increase the usability of the corpus but will at the same time make the recruiting process more costly. All check lists are collected in appendix [*] in a format suitable for copying.

Finally, Part III Examples contains three prototypical speech corpus examples (WebCommand, SpeechDat-II German, and SmartKom) together with their key specifications and a list of references.


next up previous contents
Next: Terms and Definitions Up: Introduction Previous: Intended audience   Contents
BITS Projekt-Account 2004-06-01