Automatic and Semi-automatic Segmentation

Next: Annotation Methods Up: Segmentation and Labeling Previous: Manual Segmentation Contents

Automatic and Semi-automatic Segmentation

``Automatic segmentation refers to the process whereby segment boundaries are assigned automatically by a program. This will probably be an HMM-based speech recognizer that has been given the correct symbol string as input. The output boundaries may not be entirely accurate, especially if the training data was sparse. Semi-automatic segmentation refers to the process whereby this automatic segmentation is followed by manual checking and editing of the segment boundaries.
This form of segmenting is motivated by the need to segment very large databases for the purpose of training ever more comprehensive recognizers. Manual segmentation is extremely costly in time and effort, and automatic segmentation, if sufficiently accurate, could provide a shortcut. However, it is still necessary for the researcher to derive the correct symbol string to input to the autosegmenter. This may be derived automatically from an orthographic transcription, in which case it will not always correspond to the particular utterance unless manually checked and edited. The amount of inaccuracy that is acceptable will depend on the uses to which the database is to be put, and its overall size.'' (From [2], p. 153.)

At the time of writing^8.12 there are only a few fully automatic methods known that yield usable results. These are

Segmentation into words, if the word chain is known and the speech is not very spontaneous^8.13.
Markup of prosodic events according to a reduced Tobi set^8.14
Time-alignment of a chain of phonemes using Hidden Markov Modeling^8.15.
Segmentation and labeling into phonemic units by MAUS^8.16 requiring the word chain and a statistical rule set about pronunciation.
The `elitist approach' developed by Steve Greenberg. Yields a stream of articulatory features that may be combined into phoneme categories^8.17.

All these automatic methods do not achieve the same performance as a human segmenter and labeler. However, for some applications and investigations they might be sufficient. Lately automatic segmentation into phonemic units as well as automatic prosodic tagging became rather important in the field of speech synthesis by unit selection, because this method requires large quantities of reliably segmented and labeled speech units from one speaker.

Next: Annotation Methods Up: Segmentation and Labeling Previous: Manual Segmentation Contents

BITS Projekt-Account 2004-06-01