BAS
Bavarian Archive for Speech Signals
Corpora

Gleiche Seite in deutsch

This page was last updated 2021-05-01

Please note that selected corpora of this catalogue and other corpora not listed here may be downloaded for free by academic users from the CLARIN Repository (partly marked with a (*) in the following).

Speech Corpora

Entire Catalogue
Corpora for commercial usage
Corpora of read speech
Corpora of spontaneous speech
Corpora of accentuated/dialectal/alcoholized speech
Corpora of telephone speech
Corpora of high quality speech

Multi-modal Corpora

SmartKom
SmartWeb Video Corpus (SVC)
German Sign Language Corpus (SIGNUM)
Bielefeld Speech and Gesture Alignment Corpus (SaGA)

Processing and Evaluation
File Formats and Software
Terms of Usage (EULA)
Hints for purely scientific usage
Example audio files
Applications

Speech Corpora

(If not stated otherwise, the language of the corpora is German.)

Entire Catalog

Presently the following corpora are available on CD-R/DVD-R/Harddisc/online. Note that a subset of these corpora is also online accessible for members of academic institutions and for licensees of BAS resources in the BAS CLARIN Repository (tagged with (*) in the following list).

The following speech corpiora are exclusively accessible via the BAS CLARIN Repository; a commercial usage is in some cases possible (inquiries via bas@bas.uni-muenchen.de):
CH-Jugendsprache,MOCHA,NM-MoCap-Corpus,NSC,Sprecherinnen,VERIF1DE,VMEmo,WaSeP

Siemens 1000 - SI1000 (SC7)
10 speakers - 10000 utterances - dictation - orthography
Siemens 100 - SI100
100 speakers - 10000 utterances - dictation - orthography
PhonDat 1 - PD1 (*) (2nd edition)
201 speakers - 21681 utterances - read speech - orthography, canonical transcription, automatic segmentation
PhonDat 2 - PD2 (*) (2nd edition)
16 speakers - 3200 utterances - read speech - orthography, canonical transcription, automatic segmentation, prosodic labeling
Verbmobil I (*) (Recording 1993 - 1996)
Verbmobil II (*) (Recording 1997 - 2000)
Strange Corpus 1 - SC1 ('Accents') (*)
88 speakers - 1 story - read speech - orthography, canonical transcription
Strange Corpus 2 - SC2 ('Noises') (*)
8 speakers - 8 repetitions of 100 utterances - field recordings with real background noise - noise annotated - orthography, canonical pronunciation, noises
Strange Corpus 10 - SC10 ('Accents II')
70 speakers (67 non-native, 3 native German speakers) - 100 phonetically balanced sentences, numbers from 1 to 100, 1 story, 1 dialogue, 1 re-telling of a German story - transliteration, orthography, canonical transcription
Erlanger Bahnansage - ERBA
106 speakers - 11100 utterances - read speech - orthography
SPINA ('Robot Comands') (*) (new edition)
22 speakers - robot control - 10810 utterances - read speech - phoneme and word segmentation
Regional Variants of German 1 - RVG1 (*) Recordings of all German spoken regions - 498 speaker - 32 CD-ROMs
Siemens Synthesis Corpus - SI1000P
2 professional speakers - laryngographic signal - prosodical labelling - 4 CD-ROMs
Taxi Corpus - TAXI (*)
94 dialogues of a German speaking cab dispatcher and an English speaking client - recorded via real phone connections (fixed and GSM) - orthography, canonical pronunciation, translation
Hempel's Sofa - HEMPEL (*)
3909 recordings of spontaneous speech (monologues) via public phone lines - SpeechDat transcription
FORMTASK (*)
17293 recordings of 4366 speakers answering 4 questions via public phone (land)lines - SpeechDat transcription
Regional Variants of German J - RVG-J (*)
recordings of read and spontaneous speech by adolescents age 13-20 - SpeechDat annotation
Siemens Webcommand - WEBCOMMAND
15600 recording of commands addressed to a web pad - British English and Frensh - 49 speakers - office environment - SpeechDat annotation
Ziptel (SpeechDat(M)) - ZIPTEL (*)
7746 recordings of street names, ZIP codes, city names and phone numbers - 1957 speakers - all environments - SpeechDat annotation
BITS Logatome Synthesis Corpus - BITS-LG
11036 logatom recordings covering all German diphones - 4 professional speakers - studio, 2 mics, laryngo - manual segmentation, BAS Partitur Format
BITS Unit Selection Synthesis Corpus - BITS-US
6732 sentence recordings covering all German diphones in different prosodic contexts - 4 professional speakers - studio, 2 mics, laryngo - manual phonetic segmentation and prosodic annotation, BAS Partitur Format
SmartKom Audio - SKAUDIO 1.0 (*)
Special edition containing all audio channel recording of the SmartKom corpora - 224 speakers - 448 sessions - Scenarios: Public, Home, Mobil
SmartWeb Handheld Corpus - SHC (*)
10966 human-machine queries using a smart phone - 156 speakers - natural environment, 2 mics (collar, Bluetooth), UMTS + high quality channel - transliteration (Verbmobil compatible), BAS Partitur Format
SmartWeb Motorbike Corpus - SMC
2315 human-machine queries on running motorcycle - 36 speakers - natural environment, 2 mics (Bluetooth helmet, neck micro), UMTS + high quality channel - transliteration (Verbmobil compatible), BAS Partitur Format
SmartWeb Video Corpus - SVC (*)
2218 human-machine queries in a human-human-machine situation using a smart phone, video face-capture of asking person - 99 speakers - natural environment, 2 mics (collar, Bluetooth), UMTS + high quality channel - transliteration (Verbmobil compatible), manual turn segmentation, BAS Partitur Format
Ph@ttSessionz - PHATTSESSIONZ (*)
1019 sessions of up to 138 items each (read, spontaneous) of adolescent speakers of age 12-20 - 1019 speakers - natural environment (school), 2 microphones (headset, desktop), demoscopic distribution within Germany - transliteration according SpeechDat standard, manual segmentation start/end utterance, BAS Partitur Format files, MAUS segmentation
Alcohol Language Corpus - ALC (*)
Recordings of intoxicated and sober speakers of age 22-75 - 150 speakers (estimate for the final corpus) - automotive environment, 2 microphones (headset, mouse micro) - transliteration according extended SpeechDat standard, manual segmentation start/end utterance, BAS Partitur Format files, MAUS segmentation
VeriDat Speaker Verification Corpus - VERIF1DE (*)
Corpus for speaker verification via telephone - 150 speakers - 20 recording sessions per speaker - transliteration SpeechDat standard, SpeechDat Database Format
Age and Gender Speech Corpus - aGender (*)
Paralinguistic speaker classification via phone lines - 945 speakers - 1-7 recording sessions (calls) per speaker - transliteration according to SpeechDat standard, SpeechDat database format
Corpora e Lessici dell'Italiano Parlato e Scritto, map task recordings - CLIPS_MT_MANUAL (*)
Italian Maptask Recordings from CLIPS - 30 Speakers - 2 Recording Sessions per Speaker - Transliteration, Segmentation according CLIPS Standard - BPF, TextGrid, Emu
BAS Siemens Hoergeraete Corpus - HOESI (*)
Lombard Dialogue speech - 24 Speakers - 12 Recording Sessions per Speaker Pair - Segmentation speech - non-speech - BPF
Atlante sintattico della Calabria - AsiCa
Recordings of Calabrese (Italy) - 68 Speakers - 331 Recording Sessions - Orth.-Phon. Transcription - TextGrid
Audioatlas Siebenbuergisch-Saechsischer Dialekte - ASD
Historic recordings of Saxonian German spoken in Romania in 1970 - 1805 Speakers - 2264 Recording Sessions - Orth. and phon. transcriptions - TextGrid
Speech of brother pairs (Dissertation Feiser) - BROTHERS (*)
Speech data of dissertation Feiser (2016) German - 20 (10 brother pairs) - 7240 recording sessions - orth. and autom. phon. transcription - TextGrid, emuDB

Some audio files of the available corpora.

The TED corpus is currently distributed by ELDA. Therefore BAS will only disseminate further copies of the corpus, if this first edition is run out.

For further questions or orders please contact

Corpora for commercial usage

Most speech corpora of BAS are available for commercial usage. Under commercial usage we subsumize any developments of speech technology on the basis of the BAS data and the commercial exploitation of products that were developed on the basis of the BAS data. Commercial usage does not include the direct exploitation of the data, that is no BAS data may be distributed to third parties under any circumstances. Some BAS corpora require a special lincense fee for commercial usage; see the corpus pages for details.

Corpora of read speech

The following corpora contain read speech, some of them recorded as a dictation task:

Siemens 100 - SI100
100 speakers - 10000 utterances - dictation - orthography
PhonDat 1 - PD1 (2nd edition)
201 speakers - 21681 utterances - read speech - orthography, canonical transcription, automatic segmentation
PhonDat 2 - PD2 (2nd edition)
16 speakers - 3200 utterances - read speech - orthography, canonical transcription, automatic segmentation, prosodic labeling
Strange Corpus 1 - SC1 ('Accents')
88 speakers (72 non-native, 16 native German speakers) - 1 story - read speech - orthography, canonical transcription
Strange Corpus 2 - SC2 ('Noises')
8 speakers - 8 repetitions of 100 utterances - field recordings with real background noise - noise annotated - orthography, canonical pronunciation, noises
Strange Corpus 10 - SC10 ('Accents II')
70 speakers (67 non-native, 3 native German speakers) - 100 phonetically balanced sentences, numbers from 1 to 100, 1 story - read speech - orthography, canonical transcription
Erlanger Bahnansage - ERBA
106 speakers - 11100 utterances - read speech - orthography
SPINA ('Robot Comands') (new edition)
22 speakers - robot control - 10810 utterances - read speech - phoneme and word segmentation
Regional Variants of German 1 - RVG1
Recordings of all German spoken regions - 498 speaker - 32 CD-ROMs
Siemens Synthesis Corpus - SI1000P
2 professional speakers - laryngographic signal - prosodical labelling - 4 CD-ROMs
Regional Variants of German J - RVG-J
recordings of read and spontaneous speech by adolescents age 13-20 - SpeechDat annotation
Siemens Webcommand - WEBCOMMAND
15600 recording of commands addressed to a web pad - British English and Frensh - 49 speakers - office environment - SpeechDat annotation
Ziptel (SpeechDat(M)) - ZIPTEL
7746 recordings of street names, ZIP codes, city names and phone numbers - 1957 speakers - all environments - SpeechDat annotation
BITS Logatome Synthesis Corpus - BITS-LG
11036 logatom recordings covering all German diphones - 4 professional speakers - studio, 2 mics, laryngo - manual segmentation, BAS Partitur Format
BITS Unit Selection Synthesis Corpus - BITS-US
6732 sentence recordings covering all German diphones in different prosodic contexts - 4 professional speakers - studio, 2 mics, laryngo - manual phonetic segmentation and prosodic annotation, BAS Partitur Format
Ph@ttSessionz - PHATTSESSIONZ
1019 recordings of up to 138 items each (read, spontaneous) of adolescent speakers of age 12-20 - 1019 speakers - natural environment (school), 2 microphones (headset, desktop), demoscopic distribution within Germany - transliteration according SpeechDat standard, manual segmentation start/end utterance, BAS Partitur Format files, MAUS segmentation
Alcohol Language Corpus - ALC
Recordings of intoxicated and sober speakers of age 22-75 - 150 speakers (estimate for the final corpus) - automotive environment, 2 microphones (headset, mouse micro) - transliteration according extended SpeechDat standard, manual segmentation start/end utterance, BAS Partitur Format files, MAUS segmentation
VeriDat Speaker Verification Corpus - VERIF1DE
Corpus for speaker verification via telephone - 150 speakers - 20 recording sessions per speaker - transliteration SpeechDat standard, SpeechDat Database Format
Age and Gender Speech Corpus - aGender
Corpus for speaker classification via telephone - 945 speakers - 1-7 recording sessions per speaker - transliteration SpeechDat standard, SpeechDat Database Format
Speech of brother pairs (Dissertation Feiser) - BROTHERS
Speech data of dissertation Feiser (2016) German - 20 (10 brother pairs) - 7240 recording sessions - orth. and autom. phon. transcription - TextGrid, emuDB

Corpora of spontaneous speech

The following corpora contain spontaneous recorded speech:

Verbmobil - VM I
Verbmobil - VM II
Regional Variants of German 1 - RVG1
Recordings of all German spoken regions - 498 speaker - 32 CD-ROMs
Contains 1 minute spontaneous monologue per speaker
Strange Corpus 10 - SC10 ('Accents II')
70 speakers (67 non-native, 3 native German speakers) - 1 dialogue, 1 re-telling of a German story - transliteration, orthography, canonical transcription
Taxi Corpus - TAXI
94 dialogues of a German speaking cab dispatcher and an English speaking client - recorded via real phone connections (fixed and GSM) - orthography, canonical pronunciation, translation
Hempel's Sofa - HEMPEL
3909 recordings of spontaneous speech (monologues) via public phone lines - SpeechDat transcription
FORMTASK (*)
17293 recordings of 4366 speakers answering 4 questions via public phone (land)lines - SpeechDat transcription
Regional Variants of German J - RVG-J
recordings of read and spontaneous speech by adolescents age 13-20 - SpeechDat annotation
SmartKom Audio - SKAUDIO 1.0
Special edition containing all audio channel recording of the SmartKom corpora - 224 speakers - 448 sessions - Scenarios: Public, Home, Mobil
SmartWeb Handheld Corpus - SHC
10966 human-machine queries using a smart phone - 156 speakers - natural environment, 2 mics (collar, Bluetooth), UMTS + high quality channel - transliteration (Verbmobil compatible), BAS Partitur Format
SmartWeb Motorbike Corpus - SMC
2315 human-machine queries on running motorcycle - 36 speakers - natural environment, 2 mics (Bluetooth helmet, neck micro), UMTS + high quality channel - transliteration (Verbmobil compatible), BAS Partitur Format
SmartWeb Video Corpus - SVC
2218 human-machine queries in a human-human-machine situation using a smart phone, video face-capture of asking person - 99 speakers - natural environment, 2 mics (collar, Bluetooth), UMTS + high quality channel - transliteration (Verbmobil compatible), manual turn segmentation, BAS Partitur Format
Ph@ttSessionz - PHATTSESSIONZ
1019 recordings of up to 138 items each (read, spontaneous) of adolescent speakers of age 12-20 - 1019 speakers - natural environment (school), 2 microphones (headset, desktop), demoscopic distribution within Germany - transliteration according SpeechDat standard, manual segmentation start/end utterance, BAS Partitur Format files, MAUS segmentation
Alcohol Language Corpus - ALC
Recordings of intoxicated and sober speakers of age 22-75 - 150 speakers (estimate for the final corpus) - automotive environment, 2 microphones (headset, mouse micro) - transliteration according extended SpeechDat standard, manual segmentation start/end utterance, BAS Partitur Format files, MAUS segmentation
Corpora e Lessici dell'Italiano Parlato e Scritto, map task recordings - CLIPS_MT_MANUAL
Italian Maptask Recordings from CLIPS - 30 Speakers - 2 Recording Sessions per Speaker - Transliteration, Segmentation according CLIPS Standard - BPF, TextGrid, Emu
BAS Siemens Hoergeraete Corpus - HOESI (*)
Lombard Dialogue speech - 24 Speakers - 12 Recording Sessions per Speaker Pair - Segmentation speech - non-speech - BPF
Atlante sintattico della Calabria - AsiCa
Recordings of Calabrese (Italy) - 68 Speakers - 331 Recording Sessions - Orth.-Phon. Transcription - TextGrid
Audioatlas Siebenbuergisch-Saechsischer Dialekte - ASD
Historic recordings of Saxonian German spoken in Romania in 1970 - 1805 Speakers - 2264 Recording Sessions - Orth. and phon. transcriptions - TextGrid
Speech of brother pairs (Dissertation Feiser) - BROTHERS
Speech data of dissertation Feiser (2016) German - 20 (10 brother pairs) - 7240 recording sessions - orth. and autom. phon. transcription - TextGrid, emuDB

Corpora of accentuated/dialectal/alcoholized speech speech

The following corpora contain speech with classified (foreign) accent / dialect:

Strange Corpus 1 - SC1 ('Accents')
88 speakers - 1 story - read speech - orthography, canonical transcription
Strange Corpus 10 - SC10 ('Accents II')
70 speakers (67 non-native, 3 native German speakers) - 100 phonetically balanced sentences, numbers from 1 to 100, 1 story, 1 dialogue, 1 re-telling of a German story - transliteration, orthography, canonical transcription
Regional Variants of German 1 - RVG1
Recordings of all German spoken regions - 498 speakers - 32 CD-ROMs
Demographically balanced recording in Germany, Austria and Switzerland
Regional Variants of German J - RVG-J
recordings of read and spontaneous speech by adolescents age 13-20 - SpeechDat annotation
Demographically distributed recording in Germany
Ph@ttSessionz - PHATTSESSIONZ
1019 recordings of up to 138 items each (read, spontaneous) of adolescent speakers of age 12-20 - 1019 speakers - natural environment (school), 2 microphones (headset, desktop), demoscopic distribution within Germany - transliteration according SpeechDat standard, manual segmentation start/end utterance, BAS Partitur Format files, MAUS segmentation
Alcohol Language Corpus - ALC
Recordings of intoxicated and sober speakers of age 22-75 - 150 speakers (estimate for the final corpus) - automotive environment, 2 microphones (headset, mouse micro) - transliteration according extended SpeechDat standard, manual segmentation start/end utterance, BAS Partitur Format files, MAUS segmentation
Corpora e Lessici dell'Italiano Parlato e Scritto, map task recordings - CLIPS_MT_MANUAL
Italian Maptask Recordings from CLIPS - 30 Speakers - 2 Recording Sessions per Speaker - Transliteration, Segmentation according CLIPS Standard - BPF, TextGrid, Emu
Atlante sintattico della Calabria - AsiCa
Recordings of Calabrese (Italy) - 68 Speakers - 331 Recording Sessions - Orth.-Phon. Transcription - TextGrid
Audioatlas Siebenbuergisch-Saechsischer Dialekte - ASD
Historic recordings of Saxonian German spoken in Romania in 1970 - 1805 Speakers - 2264 Recording Sessions - Orth. and phon. transcriptions - TextGrid

Corpora with telephone speech

The following BAS corpora contain speech recorded via public telephone lines (traditionell and cellular, GSM):

Verbmobil - VM II
Hempel's Sofa - HEMPEL
FORMTASK (*)
Addresses and Numbers - ZIPTEL
Taxi Corpus - TAXI
SmartWeb Handheld Corpus - SHC
10966 human-machine queries using a smart phone - 156 speakers - natural environment, 2 mics (collar, Bluetooth), UMTS + high quality channel - transliteration (Verbmobil compatible), BAS Partitur Format
SmartWeb Motorbike Corpus - SMC
2315 human-machine queries on running motorcycle - 36 speakers - natural environment, 2 mics (Bluetooth helmet, neck micro), UMTS + high quality channel - transliteration (Verbmobil compatible), BAS Partitur Format
SmartWeb Video Corpus - SVC
2218 human-machine queries in a human-human-machine situation using a smart phone, video face-capture of asking person - 99 speakers - natural environment, 2 mics (collar, Bluetooth), UMTS + high quality channel - transliteration (Verbmobil compatible), manual turn segmentation, BAS Partitur Format
VeriDat Speaker Verification Corpus - VERIF1DE
Corpus for speaker verification via telephone - 150 speakers - 20 recording sessions per speaker - transliteration SpeechDat standard, SpeechDat Database Format
Age and Gender Speech Corpus - aGender
Corpus for speaker classification via telephone - 945 speakers - 1-7 recording sessions per speaker - transliteration SpeechDat standard, SpeechDat Database Format
Speech of brother pairs (Dissertation Feiser) - BROTHERS
Speech data of dissertation Feiser (2016) German - 20 (10 brother pairs) - 7240 recording sessions - orth. and autom. phon. transcription - TextGrid, emuDB

Planned:

Corpora with high quality speech

'High quality speech' denotes recordings done with at least 16kHz sampling frequncy and in a controlled environment (studio). The following BAS corpora contain high quality speech:

Planned:

Processing and Evaluation

Before distribution the BAS corpora are evaluated for certain formal properties (BAS Revalidation). These properties include:

Signal formats are readable.
Header syntax is ok. If there are divergences, they should be regular and well documented.
Documentation is correct.
Software is tested and correct.
Any notations (like orthography, canonical word forms, segmentations, etc.) are evaluated in samplings.

After the pass of this formal evaluation, the corpora are stored as 'master volumes' in our archive. They are linked to a central documentation and software server. If there is an order, the volumes are copied to CDROM and distributed (press on demand) or online access is granted via the BAS CLARIN Repository.

In a second step the signals are analysed in more detail. An automatic segmentation in phonemes and words is carried out (MAUS), deviations from the canonical word form are detected and other features extracted. All results from further analysis are stored in the BAS Partitur Format (BPF).

In a sub-project of the German BITS project (TP8) all available BAS corpora have been re-validated against public guidelines. The results of this re-validation will be published on the BITS webserver.
Within the CLAIN initiative these guidelines for validation must be followed before publication within the BAS CLARIN repository.

File Formats and Software

Most of the disseminated speech corpora of BAS contain signal files in RIFF WAVE and NIST SPHERE formats. Some corpora contain SAM annotation formats.
A description of the formats used in BAS corpora can be found here.
Of course all formats are described in detail in the accompanying corpus documentation (you can access most of these on-line by looking up the WWW page of the corpus).
Last but not least on each BAS corpus you will find a small collection of software and ANSI C functions for the access to the signal files.

Applications

The following section gives some of the most common uses of BAS speech corpora.

Automatic speech recognition

To initialize statistically based applications for speech recognition phonetically labelled and segmented corpora are needed.
The following corpora may be used for this purpose:

read single words:
read contineous speech:
- PD1
- PD2
- SC10
noisy speech / field recordings
- SC2
- SHC
- SMC
- SVC
- VERIF1DE
spontaneous speech:
- VM
  Volume VM2 contains (partly) manually segmented speech
- SC10

For embedded training without segmentation (after bootstrapping):

read speech:
spontaneous/non-prompted speech
- VM I + II
- RVG1
- SC10
- RVG-J
- TAXI
- SKAUDIO
- SHC
- SMC
- SVC
- PHATTSESSIONZ
- HOESI
telephone speech

Human - machine interaction (HMI)

Speech synthesis

For PSOLA synthesis all corpora with segmental information may be used: (in brackets corpora with automatically segmented speech).

BITS-LG
BITS-US
SI1000P
PD1 (partly)
PD2 (partly)
VM 2 (partly)
(VM)
(PD1 rest)
(PD2 rest)

Speaker recognition, verification, adaptation, paralinguistic classification

PD1 and SI100 have a variety of speakers of both sex and different age.

Empiric phonetic investigations

Segmentals

All BAS corpora with segmentations done manually. Since these are naturally very few data, it may be wise to use automatically segmented data, too (in brackets):

PD1 (partly)
PD2 (partly)
VM 2 (partly)
SC10
CLIPS_MT_MANUAL
(VM rest)
(PD1 rest)
(PD2 rest)
(SI100)
(PHATTSESSIONZ)
(ALC)

Prosodic investigations

VM 1, 2, 3, 4, 5, 15, 20, 21, 22, 24, 28, 30, 31, 32
PD 2
SI1000P
ALC
PHATTSESSIONZ
HOESI

Foreign accents / speaker characteristics

Dialectal variation

Copyright © 1995-2016 Bayerisches Archiv für Sprachsignale, Universität Müchen
This page and all other pages with the initial 'BAS' or 'Bas' in the filename may be copied, printed and distributed to other parties, under the condition that the pages are distributed as shown here. Parts of pages or extended pages may not be distributed further withoutpermission of the BAS.

Florian Schiel

BASBavarian Archive for Speech SignalsCorpora