Corpus Specification

The SmartKom recordings are carried out in three different technical setups (Public, Home, Mobil) and in an open number of task domains which do not overlap between technical setups. Most of the following specifications were defined at a special workshop organized by the group that produced the final corpus. Attendees of this workshop were all partners of the SmartKom consortium.

In the following, the corpus specification of the total SmartKom speech data collection (all technical setups, all task domains) will be presented in the manner of a check list. The elements of this check list have already been discussed in this order in chapter [*]. If elements are not applicable for SmartKom, they're marked with `n.a.'. The following check list for SmartKom covers only the recorded WOZ speech data without considering the biometric speech corpus.

Speaker Profiles Primarily native speakers of German; gender distribution 50:50%; age ranging from 15 to 60 years; dialectal distribution not specified; education level not specified
Number of Speakers open, depending on effort and funding; if feasible: equal proportions of speakers recorded in different technical setups and different task domains
- Vocabulary free speech, no restrictions to vocabulary
- Domain depending on the implemented task domains in the SmartKom prototype; at the writing of the specifications only a few task domains were defined: cinema guide, electronic program guide (EPG), VCR control, touristic information, navigation (by foot and by car), restaurant guide, office tasks
- Task depending on the selected domain; each recording consisted of one primary and one secondary task, e.g. primary task: to find a cinema for tonight in Heidelberg, secondary task: to find a restaurant for dinner after the cinema
- Phonologic Distribution not specified

Speaking Style:  
- Read Speech -
- Answering Speech +
- Command/Control Speech +
- Non Prompted Speech +
- Spontaneous Speech +
- Neutral/Emotional +
Recording Setup: Wizard-of-Oz Recording
- Acoustical environment normal office, reverberation time dampened by curtains, furniture and acoustical absorbers on walls and ceiling
- Script subjects are told to assess the performance of a new prototype for a market study; no further explanations about the functionality of the system; description of the task to be solved; experimenter leaves room after introduction to the task; each subject is recorded in two sessions on the same day with a brief interruption between sessions
- Background noise playback noise on two channels (back and front) recorded in different environments depending on technical setup
- Microphones 1 directional microphone Sennheiser ME66/K6 on top of front camera (approx. 60 cm from mouth), microphone array of 4 Sennheiser ME104 situated at the upper end of the display area, 1 headset Sennheiser ME104 or stereo clip-on Sennheiser ME104
Technical Specifications:  
- Sampling Rate 48000 Hz
- Sample Type and Width PCM, 16 bit
- Number of Channels 9/10 (6/7 microphones, voice output, background noise back/front)
- Signal File Format Microsoft WAVE
- Annotation File Format SmartKom Transliteration, BAS Partitur Format (BPF)
- Meta Data File Format XML (DDTs provided)
- Lexicon Format Tab delimited 7-Bit ASCII text file, pronunciation coded in extended German SAM-PA

Corpus Structure:  
- Structure Hierarchical file structure according to recording
- Terminology Signal file names encode corpus type, recording session, technical setup, primary task and channel
- Distribution Media DVD-R (5GB); each recording session is stored on one DVD
Release Plan Data are released to partners as they become ready; a final integrated release is planned at the end of the project through BAS
Validation On-going validation of current releases by partners; external final validation of the entire data base by BAS
Documentation Not specified

