The Munich
Automatic Segmentation
System MAUS
gleiche Seite auf deutsch
- Contact: Florian
Schiel
What is
meant by SEGMENTATION?
The aim of phonetic sciences is to analyze the correlation between linguistic
categories (e.g. word, syllable, phone) and corresponding
signals (e.g. acoustic signal, spectrum, articulatory
signal, neuronal signals). Usually, a concrete mapping of
categories to the corresponding sections in the signal is
done according to the aim of the analysis in question. This
results in a partition of the signal in segments, known as
segmentation.
Because of the subjective nature of the analysis in question
(dependency on the observer and the thesis and therefore the
necessity of a different description) segmentations are
produced manually according to the relevant aspects of the
analysis. These data are carefully produced and of the
highest possible reliability, which is an absolute prerequisite
for experimental phonetic work.
A good training and a good experience is needed to produce careful and
reliable manual segmentations which still are time
consuming. Therefore, high quality segmentations can only be
produced for a small amount of data.
In digital speech processing, especially in ASR, a large amount of
segmented data is needed. To produce manual segmentations
for this is uneconomical. Therefore, automatic procedures
are developed to automatically segment a large amount of
data in a relatively short time. On the one hand, this is only possible in
reducing the quality of segmentations, which can be traced
back to the impreciseness of the analysis of the acoustic
signal (the computer cannot find categories and depends on
signalimmanent criteria or insufficiently analysed empirical
correlations), on the other hand it can be traced back to a
missing thesis in an experimental background.
But now a large amount of segmented material can be offered for
research and development in the area of technical speech
processing under consideration of phonetic
information. Retrospectively, success in the field of speech
processing leads to decisive improvements of automatic segmentation.
Short
Description of MAUS
Input-->>>>>
speech signal and related orthographic representation
Output-->>>>
fully automatically produced segmentation on the phonemic level
technical implementation:
- hybrid approach consisting of statistic
(HMM) and rule based components (statistically derived from a corpus)
- possible pronunciation variants of German are taken into account
- applicable to read speech as well as to spontaneous speech
- languages German, English, Islandic, Italian, Estonian, Hungarian
steps of processing (for German):
Download
of MAUS
The MAUS download package comprise a number of scripts and binaries to be
run under Linux. It is possible to run it under WinXX using cygwin (not
contained in the package).
The main scripts of the package are:
- maus : segmentation and labelling of a single signal file
- maus.corpus : S&L of a corpus of signal files
- maus.iter : iterative MAUS
and other tools to convert and display S&Ls.
Furthermore parameter sets and acoustical models for German (and other languages) are
in the package.
The algorithm to automatically learn the statistical rule set from data is not
contained.
Download
Publications on
MAUS
Verbmobil Memos:
Conference papers:
- M.-B. Wesenick, F. Schiel (1994): Applying
Speech Verification to a Large Data Base of German to Obtain a
Statistical Survey About Rules of Pronunciation, Proceedings
of ICSLP 1994, pp. 279 - 282, Yokohama.
- A. Kipp, M.-B. Wesenick, F. Schiel (1996): Automatic
Detection and Segmentation of Pronunciation Variants in German
Speech Corpora; in: Proceedings of the ICSLP 1996.
Philadelphia, pp. 106-109, Oct 1996.
- M.-B. Wesenick, A. Kipp (1996): Estimating
the Quality of Phonetic Transcriptions and Segmentations of Speech
Signals; in: Proceedings of the ICSLP 1996. Philadelphia,
pp. 129-132, Oct 1996.
- M.-B. Wesenick (1996): Automatic
Generation of German Pronunciation Variants; in: Proceedings
of the ICSLP 1996. Philadelphia, pp. 125-128, Oct 1996.
- Kipp, A., Wesenick, B. & Schiel, F. (1997): Pronunciation
Modeling Applied to Automatic Segmentation of Spontaneous Speech;
in: Proceedings of the EUROSPEECH 1997, Rhodos, Greece, pp.
1023-1026.
- F. Schiel (1997): Probabilistic analysis of pronunciation
with MAUS; in: The ELRA Newsletter, December 1997, pp. 6-9.
- Beringer, N., Schiel, F., Brietzmann, P. (1998): German
Regional Variants - A Problem for Automatic Speech Recognition?;
in: Proceedings of the ICSLP 1998. Sydney, Vol. 2, pp. 85ff, Dec.
1998.
- Beringer, N.; Schiel, F. (1999) Independent
Automatic Segmentation of Speech by Pronunciation Modeling.
Proc. of the ICPhS 1999. San Francisco. August 1999. pp. 1653-1656
- Schiel, F. (1999): Automatic Phonetic Transcription of
Non-Prompted Speech Proc. of the ICPhS 1999. San Francisco, August
1999. pp. 607-610
- Beringer N, Schiel F (2000): The Quality of Multilingual Automatic Segmentation Using German MAUS. Proc. of the International Conference on Spoken Language Processing, Beijing, China.
- N. Beringer (2003): Regeladaptive kategoriale Analyse von
Spontansprache - eine sprachenübergreifende Untersuchung.
DAGA03 - 29. Jahrestagung für Akustik, Aachen.
- Schiel, F. (2004): MAUS Goes Iterative. Proc. of the IV.
International Conference on Language Resources and Evaluation,
Lisbon, Portugal, pp. 1015-1018.
- Schiel F, Draxler Chr, Harrington J (2011): Phonemic Segmentation and Labelling using the MAUS Technique. Workshop 'New Tools and Methods for Very-Large-Scale Phonetics Research', University of Pennsylvania, January 28-31, 2011.
Dissertations:
Kipp, A. : Automatische Segmentierung und Etikettierung von
Spontansprache; Shaker Verlag Aachen 1999.
Beringer, N. : Regeladaptive kategoriale Analyse von Spontansprache; Shaker Aachen 2002