The SmartKom recordings are carried out in three different technical setups (Public, Home, Mobil) and in an open number of task domains which do not overlap between technical setups. Most of the following specifications were defined at a special workshop organized by the group that produced the final corpus. Attendees of this workshop were all partners of the SmartKom consortium.
In the following, the corpus specification of the total SmartKom speech data
collection (all technical setups, all task domains) will be presented
in the manner of a check list. The elements of this check list have already been
discussed in this order
in chapter . If elements are not applicable for
SmartKom, they're marked with `n.a.'.
The following check list for SmartKom covers only the recorded WOZ
speech data without considering the biometric speech corpus.
Speaker Profiles | Primarily native speakers of German; gender distribution 50:50%; age ranging from 15 to 60 years; dialectal distribution not specified; education level not specified |
Number of Speakers | open, depending on effort and funding; if feasible: equal proportions of speakers recorded in different technical setups and different task domains |
Contents: | |
- Vocabulary | free speech, no restrictions to vocabulary |
- Domain | depending on the implemented task domains in the SmartKom prototype; at the writing of the specifications only a few task domains were defined: cinema guide, electronic program guide (EPG), VCR control, touristic information, navigation (by foot and by car), restaurant guide, office tasks |
- Task | depending on the selected domain; each recording consisted of one primary and one secondary task, e.g. primary task: to find a cinema for tonight in Heidelberg, secondary task: to find a restaurant for dinner after the cinema |
- Phonologic Distribution | not specified |
Speaking Style: | |
- Read Speech | - |
- Answering Speech | + |
- Command/Control Speech | + |
- Non Prompted Speech | + |
- Spontaneous Speech | + |
- Neutral/Emotional | + |
Recording Setup: | Wizard-of-Oz Recording |
- Acoustical environment | normal office, reverberation time dampened by curtains, furniture and acoustical absorbers on walls and ceiling |
- Script | subjects are told to assess the performance of a new prototype for a market study; no further explanations about the functionality of the system; description of the task to be solved; experimenter leaves room after introduction to the task; each subject is recorded in two sessions on the same day with a brief interruption between sessions |
- Background noise | playback noise on two channels (back and front) recorded in different environments depending on technical setup |
- Microphones | 1 directional microphone Sennheiser ME66/K6 on top of front camera (approx. 60 cm from mouth), microphone array of 4 Sennheiser ME104 situated at the upper end of the display area, 1 headset Sennheiser ME104 or stereo clip-on Sennheiser ME104 |
Technical Specifications: | |
- Sampling Rate | 48000 Hz |
- Sample Type and Width | PCM, 16 bit |
- Number of Channels | 9/10 (6/7 microphones, voice output, background noise back/front) |
- Signal File Format | Microsoft WAVE |
- Annotation File Format | SmartKom Transliteration, BAS Partitur Format (BPF) |
- Meta Data File Format | XML (DDTs provided) |
- Lexicon Format | Tab delimited 7-Bit ASCII text file, pronunciation coded in extended German SAM-PA |
Corpus Structure: | |
- Structure | Hierarchical file structure according to recording |
- Terminology | Signal file names encode corpus type, recording session, technical setup, primary task and channel |
- Distribution Media | DVD-R (5GB); each recording session is stored on one DVD |
Release Plan | Data are released to partners as they become ready; a final integrated release is planned at the end of the project through BAS |
Validation | On-going validation of current releases by partners; external final validation of the entire data base by BAS |
Documentation | Not specified |