Post-processing includes all processing steps from the recorded raw signal
data to the final distributed corpus. The following processing steps might not
all be necessary in your corpus collection; however, some of them are (marked
with a *): file transfer from recording device to computer, file name
assignment*, filtering, cutting, synchronization, re-sampling, format
conversion*, special conversion for annotation and automatic error
detection*. Please note that some of these processing steps may be
applied after or between the annotation steps described in the next chapter
depending on the structure of your data pipelining (see section
, p.
).
We deem this chapter to be quite relevant for the prospective producer of a new speech corpus, because the costs and man power needed for post-processing is often neglected or at least grossly under-estimated. Please review this chapter carefully before you calculate the overall costs of your corpus production and take into account all the necessary post-processing steps for your individual corpus production.
Although the order of the processing steps is in principle arbitrary7.1, the most effective order is given in the following description.