As with all projects that require great efforts in terms of workload and money it is absolutely essential to start a speech corpus production with a detailed specification of all desired features, the procedures, the monitoring of the process and the final validation. If you are acting as a contractor, this is probably the phase of the project where you will have the most contact with your client. It is very important to fix all specifications in written form (mostly in the form of a technical annex to your contract) and that your client sign this annex and all later amendments.
The high costs of speech corpus production can be optimally exploited by specifying as many diverse features into one speech collection as possible. For example in a telephone based corpus with the primary aim to recognize digits and numbers the overall costs will not dramatically increase with some additional non-prompted or even spontaneous recordings within the same recording sessions. However, the re-usability of the corpus will be much higher than with a corpus that only contains read digits and numbers.
The following sections give an overview about the basic requirements of any speech corpus specification. There may be additional things to cover in the specs depending on the special nature of your corpus.
In this chapter, the following terms will be used frequently: