Before we specify the technical features of the recordings it is important to define an adequate recording setup for the speech corpus production. Basically the recording setup defines the acoustical characteristics of the resulting corpus and therefore also the usability of the data for certain applications or investigations. One can distinguish between open vs. secret recordings. People who know that they are being recorded change their speech behavior. On the other hand, secret recordings impose an ethical problem. Also, there is the risk of spending much time and effort for nothing, if the speakers later do not give their permission on using the recordings. Therefore you should use secret recordings only when there is no alternative. A good method to elicit very natural and spontaneous speech is to occupy the speakers with a task that requires some cognitive activity. People forget that they are being recorded very soon and you have the advantage that you can choose your equipment for maximal quality and not for expensive secretiveness.
Furthermore the recording setup has an impact on the recruitment of speakers: it is much less expensive to recruit speakers for a telephone recording than in a studio recording (travel costs etc.)
In the recording setup the following general features are specified:
The script may also define the order of recording prompts and has therefore an impact on the speech characteristics itself. Consider for instance a recording script that presents short utterances in groups of six each. The speaker will read these groups from paper or from a screen and most likely the grouping will influence his/her prosody significantly, for instance by lowering the pitch in the last utterance of each group. To avoid this effect you may overlap the utterances of the groups so that the last item in each group can also be found at the beginning or within another group or use filler phrases.
Finally, it is recommended that the script contains a training phase before the recordings start and possibly some breaks during the recording script. The speaker gets accustomed to the recording situation in the training phase and any adaptive effects are not represented in the corpus4.2. Frequent breaks in the script allow the speaker to relax and maybe even drink some water to prevent a hoarse voice.
In the following sections four basic recording setups are discussed; of course mixtures of these are possible and are frequently used to further widen the re-usability of the corpus.