Bavarian Archive for Speech Signals

Gleiche Seite in deutsch

Last update of this page: 2012-04-18

BAS Projects

Munich AUtomatic Segmentation System (MAUS) - Available

The final aim of the MAUS project will be the full automatic annotation of arbitrary speech utterances. MAUS in his final design will produce the following output: using only the following input: A short description of MAUS can be found here or here (only German). Please also refer to our BAS publications.

MAUS can be used as a freeware package but also as a web service.

There is a close connection between the MAUS project and the development of the Partitur Format that allows an easy and well structured way to represent categorical information of speech signals.

The development of MAUS was partly funded by the German Verbmobil Project.

BAS PHONOLEX Lexicon - Available

Almost every kind of speech processing is in need of some 'canonical' or empirical definition of pronunciation of single words. Such a computer-readable pronunciation lexicon which contains all flexions of a sufficiant amount of German words is currently not available.
Furthermore there's no resource for spontaneous speech which contains many 'non-regular' words or other than lexical speech events that nevertheless speech people have to cope with.

On the other hand very large well structured linguistic lexica exist but they do not contain any information about pronunciation. An often used work-around is the usage of automatic graphem-to-phoneme converters which of course do not work perfectly and usually are not free available.

The first aim of the BAS PHONOLEX Project will be the development of list of canonical pronunciations that cover approx. 95 % of written German (including all possible flexions). This task is done in close cooperation with the University of Saarbrücken (Prof. Uzkoreith), the University of Bonn (Dr. Stock) and the University of Leipzig (Dr. Quasthoff).
Currently (Version 2.6) the lexical list covers more than 1.600.000 entries; futher details about contants, format, availability can be found here.

Strange Corpora - SC

The Strange Corpora are a series of smaller corpora. Each of them will document a certain known problem in the field of speech science and engineering. By the use of these corpora scientists and engineers may test their solutions and/or applications and compare their performance.

Presently the following SC corpora are available or planned:

Some of these SC corpora will be subdivided by BAS into training and test corpora respectively. That way users may refer to the corpora in publications.

If you as a member of the speech community have other interesting proposals for the SC corpus, don't hesitate to contact us under the following email:

BAS Edition of Verbmobil Corpora - VM - Completed

During the first year after edition the Verbmobil Corpora (spontaneous dialog recordings) are for the exclusive use of the official VM partners only. After that period the corpus is distributed by the BAS as well as the European Language Resources Agency (ELRA) of the European Union.

BAS is providing an extented edition of these corpora. This edition contains the cut signal files as before and additionally the orthographic transliteration, a so-called 'proposed transscription' (the former term 'canonical form' cannot be used further for spontaneous speech) and - if feasible - a first automatic phonological segmentation. There may be some more results of other Verbmobil partners that are included as well, like prosodic and syntactic information.

Articulatory data - EMA

There exists a huge amount of EMA (electromagnetic articulography) data recorded from speakers that have spoken the SI1000 Corpus, which will be edited in a seperate corpus. The corpus will contain the speech signal as well as the geometric parameters of the vocal tract. Estimated availability: End of 2000.

Spicos Training Data - SPICOS

The training corpus used in the SPICOS Project is still one of the major corpora in German that may be used for bootstrapping speech recognition algorithms. It contains the speech of 12 speakers each speaking 100 - 400 phonem-balanced sentences of German. The data were fully transliterated into IPA.
BAS is planning to edit this corpus again after a careful validation and filtering of the original data. The edition will contain the full IPA annotation for phonetic science as well as a SAM-PA annotation for technical usage.

BAS Edition of SmartKom Corpora - SK - Available

The SmartKom Corpora (Multimodale WOZ dialogue recordings) will be distributed to the scientific community after one year of exclusive usage of the SmartKom consortium (starting 09/2003).

BAS will provide an updated and extended edition of the SmartKom corpora.

Munich Automatic Speaker Verification - MASV - Available

MASV stands for Munich Automatic Speaker Verification. It is an experimental environment to setup and test speaker verification systems based on HMMs or GMMs.
It depends on the HTK tools (version 3.1 or greater), Matlab (version 5 or greater) and Perl (version 5 or greater). The Perl scripts control training and testing of speaker models, the Matlab part provides various score normalization schemes and a GUI for exploring the performance of a speaker verification system. MASV is published under the GNU General Public License in the hope to help others in getting started with speaker verification based on HMM models. The key features are: A more detailed description can be found in the manual which can be downloaded from the MASV website.

ASR Benchmark for spontaneous German (Verbmobil) - Available

Based on the Verbmobil corpora we define a training, development and test sets, lexica and language models. We report the base line word accuracy on a mono-phone HTK recognizer as well as a cross-word tri-phone recognizer.

ASR Benchmark for telephone speech, German (SpeechDat) - Available

Based on the German SpeechDat corpora SpeechDat II and SpeechDat Mobil we define a training, development and test sets, lexica and language models. We report the base line word accuracy on a mono-phone HTK recognizer for the fixed network and results from using the base line training on GSM speech. Also we report about adaptation techniques to overcome the observed drop in performance.

ASR Adress Recognizer, German (GEO1)

The aim of this project is to build up an experimental ASR system to recognize German addresses over the telephone network. The recognizer will be HTK based and use the German SpeechDat corpus as acoustical training and the GEO1 database as pronunciation base.

Alcohol Language Corpus (ALC) Available

The aim of this project is to create a multi-style speech corpus of intoxicated speakers for the investigation of alcoholic intoxication on speech. The corpus contains speech of the same speakers under sober and intoxicated conditions. It comprises a variety of speaking styles ranging from simple digit strings over read speech, tongue twisters, application commands (elicited by situational prompting), monologues to real conversational speech. ALC aims at a total number of 150 speakers (75/75 female/male). Grade of intoxication is being monitored by breath and blood samples. This project is being conducted in close cooperation with the Institute of Legal Medicine, University of Munich, and the Association against Alcohol and Drugs in Traffic (B.A.D.S.), Germany.

Major BAS Cooperations

SpeechDat - SD - Completed

Presently BAS is engaged in the production of the German SpeechDat corpus (telephone speech) as a subcontractor of Siemens Company Munich. Whether this corpus will be - as a whole or partly - disseminated by BAS or the ERLA is uncertain at the moment.
The first (1000 speakers) and second (4000 speakers) project phase is finished successfully. Also the project
SpeechDat Car (another 600 recordings in the running car) is finished.

Extension of Regional Variants of German - RVG-J, Ph@tt Sessionz Available

In cooperation with AT&T Lucent the first corpus of German dialects was produced in the 1990s (RVG1). The recordings are done with four different microphones in parallel (low cost to studio quality) in normal office enviroment. The recorded items cover di-phone balanced sentences, single digits, connected digit strings, telephon numbers, computer commands and 1 minute spontaneous speech. The 500 speakers were recorded in different locations by a mobile recording equipment.

In several projects this corpus will be extended by adolescent speakers. The aim is more than 1000 speakers and the usage of web-based recording techniques like SpeechRecorder.

Speaker Verification over the Telephone - VERIDAT

In cooperation with the German Telecom we developed a corpus for speaker verification over the telephone network. Since this corpus will not be distributed publicly via the BAS nor the ELRA, please contact BAS, if you are interested in a bilateral user agreement.

Moving vehicle data - AUTO

Currently several speech data collection in the moving automobil are under way in close cooperation with several industrial partners. The data are recorded from several speakers, in different dialectal regions of Germany and in different car models.
No distribution via BAS planned.


The acronym TUMMIC stands for "Thoroughly User-Oriented Man-Machine Interface in Cars". Several institutes of the Technical University of Munich (Institute of Ergonomics, Faculty of Augmented Reality, Institute for Human-Machine Communication and the Chair of Software and Systems Engineering), the IPSK of the LMU and the Institute for Psychology from Regensburg collaborate closely in this project. Together, they develop a concept for the operation of assistance and information systems in cars.

Florian Schiel