Next: Meta Data of SpeechDat Up: SpeechDat II German Previous: SpeechDat II German Contents

Corpus Specification

SpeechDat-II is an EU-funded project to create telephone speech databases for the development of speech recognizers and speaker verification for voice-driven applications and tele-services. The main motivation for SpeechDat-II was to

collect comparable databases in all major European languages for both the fixed and the mobile telephone networks,
establish a standard for telephone speech data collections by publishing all database specifications,
exchange the databases within the project, and
make the databases available to the general public after a given blocking period.

In SpeechDat-II, competitors on the market collaborate to share the effort of creating a database, and then individually exploit these databases to develop competing applications, devices and services.

SpeechDat-II is a successor to the pilot project SpeechDat-M, and it has been succeeded by a number of further projects, e.g. SpeechDat-E for the East European languages, SpeechDat-Car for data collection in mobile environments, OrienTel for the Mediterranean languages, and numerous similar projects in all parts of the world.

The German SpeechDat-II data collections were performed by the BAS at Munich university under a subcontract to Siemens for the fixed telephone network, and Vocalis for the mobile telephone network.

In the following, the corpus specification of the fixed network German SpeechDat II data collection will be presented in the manner of a check list. The elements of this check list have already been discussed in this order in chapter . If elements are not applicable for SpeechDat, they're marked with `n.a.'.

Speaker Profiles Primarily native speakers of German; gender distribution 50:50% with a tolerance of +/- 5%, three age classes (16-30, 31-45, 46 and older): each of them a minimum of 20%; for the dialectal distribution Germany is divided into 11 regions corresponding to the larger federal states with a number of speakers proportional to their population; education level not specified

Number of Speakers 5000

Contents:

- Vocabulary Digits, numbers, date and time expressions, simple application words and phrases, spellings, person, company and geographic names, phonetically rich words and sentences

- Domain not specified

- Task not specified

- Phonologic Distribution applied only to phonetically rich words and sentences

Speaking Style:

- Read Speech +

- Answering Speech +

- Command/Control Speech -

- Non Prompted Speech +

- Spontaneous Speech -

- Neutral/Emotional -

Recording Setup: Telephone Recording

- Acoustical environment 3 environments specified: office, home, telephone booth: minimum of telephone booth 2% of recordings

- Script Prompt sheet and guided dialog by telephone server

- Background noise natural, dependent on environment

- Microphones not specified, but classification between rotary and DTMF phones required

Technical Specifications:

- Sampling Rate 8000 Hz

- Sample Type and Width ALAW, 8 bit

- Number of Channels 1

- Signal File Format RAW header-less data

- Annotation File Format SAM

- Meta Data File Format Tab delimited ISO-8859 text files

- Lexicon Format Tab delimited ISO-8859 text file, pronunciation coded in SAM-PA

Corpus Structure:

- Structure Hierarchical file structure according to recording sessions

- Terminology signal file names encode recording session and prompt item

- Distribution Media CD-R

Release Plan SpeechDat II is to be available through ELRA after a 12 month blocking period after the end of the project

Validation External pre-validation after 10 recordings; external final validation of the entire data base

Documentation Specifications publicly available, recording logs and final validation report included in the distribution

Next: Meta Data of SpeechDat Up: SpeechDat II German Previous: SpeechDat II German Contents

BITS Projekt-Account 2004-06-01

Speaker Profiles	Primarily native speakers of German; gender distribution 50:50% with a tolerance of +/- 5%, three age classes (16-30, 31-45, 46 and older): each of them a minimum of 20%; for the dialectal distribution Germany is divided into 11 regions corresponding to the larger federal states with a number of speakers proportional to their population; education level not specified
Number of Speakers	5000
Contents:
- Vocabulary	Digits, numbers, date and time expressions, simple application words and phrases, spellings, person, company and geographic names, phonetically rich words and sentences
- Domain	not specified
- Task	not specified
- Phonologic Distribution	applied only to phonetically rich words and sentences
Speaking Style:
- Read Speech	+
- Answering Speech	+
- Command/Control Speech	-
- Non Prompted Speech	+
- Spontaneous Speech	-
- Neutral/Emotional	-
Recording Setup:	Telephone Recording
- Acoustical environment	3 environments specified: office, home, telephone booth: minimum of telephone booth 2% of recordings
- Script	Prompt sheet and guided dialog by telephone server
- Background noise	natural, dependent on environment

- Microphones	not specified, but classification between rotary and DTMF phones required
Technical Specifications:
- Sampling Rate	8000 Hz
- Sample Type and Width	ALAW, 8 bit
- Number of Channels	1
- Signal File Format	RAW header-less data
- Annotation File Format	SAM
- Meta Data File Format	Tab delimited ISO-8859 text files
- Lexicon Format	Tab delimited ISO-8859 text file, pronunciation coded in SAM-PA
Corpus Structure:
- Structure	Hierarchical file structure according to recording sessions
- Terminology	signal file names encode recording session and prompt item
- Distribution Media	CD-R
Release Plan	SpeechDat II is to be available through ELRA after a 12 month blocking period after the end of the project
Validation	External pre-validation after 10 recordings; external final validation of the entire data base
Documentation	Specifications publicly available, recording logs and final validation report included in the distribution