Next: Corpus Specification
Up: The Production of Speech
Previous: Comments
Contents
This part of the cookbook describes the entire process of speech corpus
production in a more or less chronological manner. Figure
shows the major steps of the process and
their relation on a time axis progressing from top to bottom.
Figure 3.1:
Typical schedule of a speech corpus production
|
As you can see, some steps have a strict order because
they rely on results or data produced in the previous step, while
others may be carried out in parallel. For example, it does not make
sense to start with the creation of the pronunciation dictionary
before the annotation is finished,
because you need a basic transcription to create the dictionary. On the
other hand, in many corpus productions collection, post-processing and
annotation run in parallel to save time.
Also shown in figure
is the ideal
concept of external validations at least at two points in time by an
independent validation institution. Although in most cases insufficient
funding prevents such a design, you should at least do an in-house
validation then.
All the shown tasks will be discussed in the following chapters in
detail. At the end of each chapter you will find a useful check list as a
help for your individual speech corpus production.
Subsections
Next: Corpus Specification
Up: The Production of Speech
Previous: Comments
Contents
BITS Projekt-Account
2004-06-01