Corpus Specification

SpeechDat-II is an EU-funded project to create telephone speech databases for the development of speech recognizers and speaker verification for voice-driven applications and tele-services. The main motivation for SpeechDat-II was to In SpeechDat-II, competitors on the market collaborate to share the effort of creating a database, and then individually exploit these databases to develop competing applications, devices and services.

SpeechDat-II is a successor to the pilot project SpeechDat-M, and it has been succeeded by a number of further projects, e.g. SpeechDat-E for the East European languages, SpeechDat-Car for data collection in mobile environments, OrienTel for the Mediterranean languages, and numerous similar projects in all parts of the world.

The German SpeechDat-II data collections were performed by the BAS at Munich university under a subcontract to Siemens for the fixed telephone network, and Vocalis for the mobile telephone network.

In the following, the corpus specification of the fixed network German SpeechDat II data collection will be presented in the manner of a check list. The elements of this check list have already been discussed in this order in chapter [*]. If elements are not applicable for SpeechDat, they're marked with `n.a.'.

Speaker Profiles Primarily native speakers of German; gender distribution 50:50% with a tolerance of +/- 5%, three age classes (16-30, 31-45, 46 and older): each of them a minimum of 20%; for the dialectal distribution Germany is divided into 11 regions corresponding to the larger federal states with a number of speakers proportional to their population; education level not specified
Number of Speakers 5000
- Vocabulary Digits, numbers, date and time expressions, simple application words and phrases, spellings, person, company and geographic names, phonetically rich words and sentences
- Domain not specified
- Task not specified
- Phonologic Distribution applied only to phonetically rich words and sentences
Speaking Style:  
- Read Speech +
- Answering Speech +
- Command/Control Speech -
- Non Prompted Speech +
- Spontaneous Speech -
- Neutral/Emotional -
Recording Setup: Telephone Recording
- Acoustical environment 3 environments specified: office, home, telephone booth: minimum of telephone booth 2% of recordings
- Script Prompt sheet and guided dialog by telephone server
- Background noise natural, dependent on environment

- Microphones not specified, but classification between rotary and DTMF phones required
Technical Specifications:  
- Sampling Rate 8000 Hz
- Sample Type and Width ALAW, 8 bit
- Number of Channels 1
- Signal File Format RAW header-less data
- Annotation File Format SAM
- Meta Data File Format Tab delimited ISO-8859 text files
- Lexicon Format Tab delimited ISO-8859 text file, pronunciation coded in SAM-PA
Corpus Structure:  
- Structure Hierarchical file structure according to recording sessions
- Terminology signal file names encode recording session and prompt item
- Distribution Media CD-R
Release Plan SpeechDat II is to be available through ELRA after a 12 month blocking period after the end of the project
Validation External pre-validation after 10 recordings; external final validation of the entire data base
Documentation Specifications publicly available, recording logs and final validation report included in the distribution

