Next: WebCommand Up: The Production of Speech Previous: Check List Distribution Contents

Examples

The third part of this cookbook describes the specifications of three prototypical speech corpora: WebCommand, SpeechDat and SmartKom.

WebCommand is an example for a low-cost small-size corpus production, SpeechDat describes the specs of an international and commercial speech corpus production in the field of telephony, and finally SmartKom is a good example for a complex scientific corpus collection of multi-modal data including speech data.

	WebCommand	SpeechDat	Smartkom
Content	Commands	Diverse	Dialogue
Language	English/French	13 European	German
Speaker	40	5000	400
Type	Read	Read	Spontaneous
Signal	Online	Telephone	Online
Channels	2	1	9
Environment	Office	Field	Studio
Size	9 GB	30 GB	25 GB
Annotation	SpeechDat	SpeechDat	SK Transliteration

The examples are non-fictitious and by no means meant as role models for an ideal corpus specification. The descriptions were taken from the real corpus contents and missing or badly designed contents are commented on accordingly.

To make the link to the remaining contents of this cookbook easier and to simplify comparisons between the different corpora styles the main description of each corpus is structured in a table more or less according to chapter of this cookbook.

Subsections

Next: WebCommand Up: The Production of Speech Previous: Check List Distribution Contents

BITS Projekt-Account 2004-06-01