Next: Annotation Methods
Up: Segmentation and Labeling
Previous: Manual Segmentation
Contents
``Automatic segmentation refers to the process whereby segment boundaries are
assigned automatically by a program. This will probably be an HMM-based speech
recognizer that has been given the correct symbol string as input. The output
boundaries may not be entirely accurate, especially if the training data was
sparse. Semi-automatic segmentation refers to the process whereby this automatic
segmentation is followed by manual checking and editing of the segment
boundaries.
This form of segmenting is motivated by the need to segment very large databases
for the purpose of training ever more comprehensive recognizers. Manual
segmentation is extremely costly in time and effort, and automatic segmentation,
if sufficiently accurate, could provide a shortcut. However, it is still
necessary for the researcher to derive the correct symbol string to input to the
autosegmenter. This may be derived automatically from an orthographic
transcription, in which case it will not always correspond to the particular
utterance unless manually checked and edited. The amount of inaccuracy that is
acceptable will depend on the uses to which the database is to be put, and its
overall size.''
(From [2], p. 153.)
At the time of writing8.12 there are only a few fully
automatic methods known that yield usable results. These are
- Segmentation into words, if the word chain is known and the speech
is not very spontaneous8.13.
- Markup of prosodic events according to a reduced Tobi set8.14
- Time-alignment of a chain of phonemes using Hidden Markov
Modeling8.15.
- Segmentation and labeling into phonemic units by
MAUS8.16 requiring
the word chain and a statistical rule set about pronunciation.
- The `elitist approach' developed by Steve Greenberg. Yields a stream of
articulatory features that may be combined into phoneme
categories8.17.
All these automatic methods do not achieve the same performance as a
human segmenter and labeler. However, for some applications and
investigations they might be sufficient. Lately automatic
segmentation into phonemic units as well as automatic prosodic
tagging became rather important in the field of
speech synthesis by unit selection, because this method requires large
quantities of reliably segmented and labeled speech units from one
speaker.
Next: Annotation Methods
Up: Segmentation and Labeling
Previous: Manual Segmentation
Contents
BITS Projekt-Account
2004-06-01