SpeechDat-II is a successor to the pilot project SpeechDat-M, and it has been succeeded by a number of further projects, e.g. SpeechDat-E for the East European languages, SpeechDat-Car for data collection in mobile environments, OrienTel for the Mediterranean languages, and numerous similar projects in all parts of the world.
The German SpeechDat-II data collections were performed by the BAS at Munich university under a subcontract to Siemens for the fixed telephone network, and Vocalis for the mobile telephone network.
In the following, the corpus specification of the fixed network German SpeechDat II data
collection will be presented
in the manner of a check list. The elements of this check list have already been
discussed in this order
in chapter .
If elements are not applicable for SpeechDat, they're marked with `n.a.'.
Speaker Profiles | Primarily native speakers of German; gender distribution 50:50% with a tolerance of +/- 5%, three age classes (16-30, 31-45, 46 and older): each of them a minimum of 20%; for the dialectal distribution Germany is divided into 11 regions corresponding to the larger federal states with a number of speakers proportional to their population; education level not specified |
Number of Speakers | 5000 |
Contents: | |
- Vocabulary | Digits, numbers, date and time expressions, simple application words and phrases, spellings, person, company and geographic names, phonetically rich words and sentences |
- Domain | not specified |
- Task | not specified |
- Phonologic Distribution | applied only to phonetically rich words and sentences |
Speaking Style: | |
- Read Speech | + |
- Answering Speech | + |
- Command/Control Speech | - |
- Non Prompted Speech | + |
- Spontaneous Speech | - |
- Neutral/Emotional | - |
Recording Setup: | Telephone Recording |
- Acoustical environment | 3 environments specified: office, home, telephone booth: minimum of telephone booth 2% of recordings |
- Script | Prompt sheet and guided dialog by telephone server |
- Background noise | natural, dependent on environment |
- Microphones | not specified, but classification between rotary and DTMF phones required |
Technical Specifications: | |
- Sampling Rate | 8000 Hz |
- Sample Type and Width | ALAW, 8 bit |
- Number of Channels | 1 |
- Signal File Format | RAW header-less data |
- Annotation File Format | SAM |
- Meta Data File Format | Tab delimited ISO-8859 text files |
- Lexicon Format | Tab delimited ISO-8859 text file, pronunciation coded in SAM-PA |
Corpus Structure: | |
- Structure | Hierarchical file structure according to recording sessions |
- Terminology | signal file names encode recording session and prompt item |
- Distribution Media | CD-R |
Release Plan | SpeechDat II is to be available through ELRA after a 12 month blocking period after the end of the project |
Validation | External pre-validation after 10 recordings; external final validation of the entire data base |
Documentation | Specifications publicly available, recording logs and final validation report included in the distribution |