BAS
Bavarian Archive for Speech Signals
Projects

Gleiche Seite in deutsch

Last update of this page: 2018-10-11

BAS Projects (internal funding)

Munich AUtomatic Segmentation System (MAUS)
BAS Lexicon PHONOLEX
'Strange Corpora' - SC
BAS Edition of Verbmobil Corpora - VM
Articulatory Data - EMA
Spicos Training Corpus - SPICOS
BITS: BAS Infrastructures for Technical Speech processing
BAS Edition of SmartKom Corpora - SK
Munich Automatic Speaker Verification - MASV
ASR Benchmark for Spontaneous German (Verbmobil)
ASR Benchmark for Telephone Speech, German (SpeechDat)
ASR Adress Recognizer, German (GEO1)
Alcohol Language Corpus (ALC)
Extension of Regional Variants of German
BAS CLARIN Webservices
BAS CLARIN Repository

Major BAS Cooperations (industrial funding)

SpeechDat - SD
Moving vehicle data - AUTO
Speaker Verifikation over the Telephone - VERIDAT
BMW - TUMMIC

BAS Projects

Munich AUtomatic Segmentation System (MAUS) - Available

The final aim of the MAUS project will be the full automatic annotation of arbitrary speech utterances. MAUS in his final design will produce the following output:

phonological transcript (SAM-PA)
segmentation
automatic detection and classification of pronunciation variants
automatic annotation and segmentation of non-speech events
automatic corpus-based generation of statistical pronunciation models of words

using only the following input:

speech signal
orthography of the utterance

A short description of MAUS can be found here or here (only German). Please also refer to our BAS publications.

MAUS can be used as a freeware package but also as a web service.

There is a close connection between the MAUS project and the development of the Partitur Format that allows an easy and well structured way to represent categorical information of speech signals.

The development of MAUS was partly funded by the German Verbmobil Project.

BAS PHONOLEX Lexicon - Available

Almost every kind of speech processing is in need of some 'canonical' or empirical definition of pronunciation of single words. Such a computer-readable pronunciation lexicon which contains all flexions of a sufficiant amount of German words is currently not available.
Furthermore there's no resource for spontaneous speech which contains many 'non-regular' words or other than lexical speech events that nevertheless speech people have to cope with.

On the other hand very large well structured linguistic lexica exist but they do not contain any information about pronunciation. An often used work-around is the usage of automatic graphem-to-phoneme converters which of course do not work perfectly and usually are not free available.

The first aim of the BAS PHONOLEX Project will be the development of list of canonical pronunciations that cover approx. 95 % of written German (including all possible flexions). This task is done in close cooperation with the University of Saarbrücken (Prof. Uzkoreith), the University of Bonn (Dr. Stock) and the University of Leipzig (Dr. Quasthoff).
Currently (Version 2.6) the lexical list covers more than 1.600.000 entries; futher details about contants, format, availability can be found here.

Strange Corpora - SC

The Strange Corpora are a series of smaller corpora. Each of them will document a certain known problem in the field of speech science and engineering. By the use of these corpora scientists and engineers may test their solutions and/or applications and compare their performance.

Presently the following SC corpora are available or planned:

SC1 'Accents I' - The same text (story) read by 16 native Germans and 76 non-native speakers.
Status: available
SC2 'Noises' - recordings with classified background noise (permanent and transients).
Status: available
SC3 'Phone Noises' - recordings done by a 'real life' telephone service system with classified noise events and background noise.
Status: work in progress, estimated availability unknown.
SC4 'Hesitations' - spontaneous speech containing classified forms of hesitations.
Status: planned
SC5 'Breaks' - spontaneous speech containing classified breaks like word breaks, sentence breaks or breaks caused by technical interupts.
Status: planned
SC6 'Repetitions, repairs' - spontaneous speech containing classified forms of repair or repetitions.
Status: planned
SC7 'Speaker adaptation' - thousand utterances spoken by 10 speakers (5 male + 5 female) to test algorithms for automatic speaker adaptation.
Status: available (SI1000)
SC8 'Pitch' - Utterances of different speakers with extrem deviations in pitch and/or intonation.
Status: planned
SC9 'Pathology' - recordings with classified impaired speech.
Status: planned
SC10 'Accents II' - recordings of non-native speakers speaking a variety of German styles with a strong accent. 16 languages + German as a reference; total 70 speakers; varying recoding situations (read, monologue,...); partly with phonetic labeling and segmentation.
Status: available

Some of these SC corpora will be subdivided by BAS into training and test corpora respectively. That way users may refer to the corpora in publications.

If you as a member of the speech community have other interesting proposals for the SC corpus, don't hesitate to contact us under the following email:

BAS Edition of Verbmobil Corpora - VM - Completed

During the first year after edition the Verbmobil Corpora (spontaneous dialog recordings) are for the exclusive use of the official VM partners only. After that period the corpus is distributed by the BAS as well as the European Language Resources Agency (ELRA) of the European Union.

BAS is providing an extented edition of these corpora. This edition contains the cut signal files as before and additionally the orthographic transliteration, a so-called 'proposed transscription' (the former term 'canonical form' cannot be used further for spontaneous speech) and - if feasible - a first automatic phonological segmentation. There may be some more results of other Verbmobil partners that are included as well, like prosodic and syntactic information.

Articulatory data - EMA

There exists a huge amount of EMA (electromagnetic articulography) data recorded from speakers that have spoken the SI1000 Corpus, which will be edited in a seperate corpus. The corpus will contain the speech signal as well as the geometric parameters of the vocal tract. Estimated availability: End of 2000.

Spicos Training Data - SPICOS

The training corpus used in the SPICOS Project is still one of the major corpora in German that may be used for bootstrapping speech recognition algorithms. It contains the speech of 12 speakers each speaking 100 - 400 phonem-balanced sentences of German. The data were fully transliterated into IPA.
BAS is planning to edit this corpus again after a careful validation and filtering of the original data. The edition will contain the full IPA annotation for phonetic science as well as a SAM-PA annotation for technical usage.

BAS Edition of SmartKom Corpora - SK - Available

The SmartKom Corpora (Multimodale WOZ dialogue recordings) will be distributed to the scientific community after one year of exclusive usage of the SmartKom consortium (starting 09/2003).

BAS will provide an updated and extended edition of the SmartKom corpora.

Munich Automatic Speaker Verification - MASV - Available

MASV stands for Munich Automatic Speaker Verification. It is an experimental environment to setup and test speaker verification systems based on HMMs or GMMs.
It depends on the HTK tools (version 3.1 or greater), Matlab (version 5 or greater) and Perl (version 5 or greater). The Perl scripts control training and testing of speaker models, the Matlab part provides various score normalization schemes and a GUI for exploring the performance of a speaker verification system. MASV is published under the GNU General Public License in the hope to help others in getting started with speaker verification based on HMM models. The key features are:

all HMM types provided by HTK (including GMMs) can be used.
easily adaptable to different speech databases.
easy setup of different speaker sets (customers, impostors, world speakers, development set,...).
various possibilities of seeding models before training.
parallel processing of training / testing supported.
several score normalizations possible: world model, cohort speakers, handset normalization (Hnorm).
easy evaluation with Matlab GUI (including comparison between matched / mismatched conditions).

A more detailed description can be found in the manual which can be downloaded from the MASV website.

ASR Benchmark for spontaneous German (Verbmobil) - Available

Based on the Verbmobil corpora we define a training, development and test sets, lexica and language models. We report the base line word accuracy on a mono-phone HTK recognizer as well as a cross-word tri-phone recognizer.

ASR Benchmark for telephone speech, German (SpeechDat) - Available

Based on the German SpeechDat corpora SpeechDat II and SpeechDat Mobil we define a training, development and test sets, lexica and language models. We report the base line word accuracy on a mono-phone HTK recognizer for the fixed network and results from using the base line training on GSM speech. Also we report about adaptation techniques to overcome the observed drop in performance.

ASR Adress Recognizer, German (GEO1)

The aim of this project is to build up an experimental ASR system to recognize German addresses over the telephone network. The recognizer will be HTK based and use the German SpeechDat corpus as acoustical training and the GEO1 database as pronunciation base.

Unfinished with partly usable results; please contact bas@bas.uni-muenchen.de if you are interested in these sort of data.

Alcohol Language Corpus (ALC) Available

The aim of this project is to create a multi-style speech corpus of intoxicated speakers for the investigation of alcoholic intoxication on speech. The corpus contains speech of the same speakers under sober and intoxicated conditions. It comprises a variety of speaking styles ranging from simple digit strings over read speech, tongue twisters, application commands (elicited by situational prompting), monologues to real conversational speech. ALC aims at a total number of 150 speakers (75/75 female/male). Grade of intoxication is being monitored by breath and blood samples. This project is being conducted in close cooperation with the Institute of Legal Medicine, University of Munich, and the Association against Alcohol and Drugs in Traffic (B.A.D.S.), Germany.

CLARIN-D Webservices and Webinterface - Available

Within the CLARIN-D projects funded by the German BMB+F the BAS developped a series of REST call based speech tools (G2P,MAUS,CHUNKER, etc.) and a user-friendly webinterface to process speech data using these webservices interactively.

CLARIN-D Respository for Speech Resources - Available

Within the CLARIN-D projects funded by the German BMB+F the BAS archive of speech resources has been transformed in to a CLARIN center of type B, including this repository.

Major industrial BAS Cooperations

SpeechDat - SD - Completed

Presently BAS is engaged in the production of the German SpeechDat corpus (telephone speech) as a subcontractor of Siemens Company Munich. Whether this corpus will be - as a whole or partly - disseminated by BAS or the ERLA is uncertain at the moment.
The first (1000 speakers) and second (4000 speakers) project phase is finished successfully. Also the project SpeechDat Car (another 600 recordings in the running car) is finished.

Extension of Regional Variants of German - RVG-J, Ph@tt Sessionz Available

In cooperation with AT&T Lucent the first corpus of German dialects was produced in the 1990s (RVG1). The recordings are done with four different microphones in parallel (low cost to studio quality) in normal office enviroment. The recorded items cover di-phone balanced sentences, single digits, connected digit strings, telephon numbers, computer commands and 1 minute spontaneous speech. The 500 speakers were recorded in different locations by a mobile recording equipment.

In several projects this corpus will be extended by adolescent speakers. The aim is more than 1000 speakers and the usage of web-based recording techniques like SpeechRecorder.

BASBavarian Archive for Speech SignalsProjects

BAS Projects

CLARIN-D Webservices and Webinterface - Available

CLARIN-D Respository for Speech Resources - Available

Major industrial BAS Cooperations

BAS
Bavarian Archive for Speech Signals
Projects