To cut out the relevant signal segments and split them into correctly named single files you may either use automatic procedures or do it manually using a sound editor or a combination of both.
Fully automatic editing procedures require either a good silence detector or - in case that there is background noise present in your signal - a good speech / non-speech detector, and in addition you have to know when and how many pauses are to be expected in the speech recording. Such a technique might work reliably if your recording contains single utterances where no between-speech-silence occurs. If you are recording sentences, turns of a dialogue or even free conversation, this technique will most certainly fail. In any case we recommend that you verify the resulting cut signal files in your annotation step (see next chapter) for editing errors. Alternatively you might use a semi-automatic procedure that detects codes in your raw signal file to get the editing information. For instance in a telephone recording of a dialogue between two parties, we asked the speakers to press a certain button on their DTMF phone before starting to speak7.4. The whole session was recorded into a single channel raw file7.5 and later in post-processing this file was automatically cut into the turns of the dialogue (). The DTMF codes might also be created automatically by a computer which controls the text prompting as done successfully in the SpeechDat Car project ().
In most cases however you will need to cut out relevant signals manually. It depends on your individual corpus design which segments are a good choice. In our practice we encountered the use of whole dialogues, turns, sentences, dialog acts, phrases, words or even single phonemes. Also keep in mind that the physical editing of your raw signal files might also be avoided by providing only the segmental information - as done for instance in the Verbmobil II corpus collection. In Verbmobil II the whole dialogue between two partners was recorded in several synchronized channels and only the begin and end of each turn was marked in an annotation file, so that partners might cut out automatically relevant speech segments stemming from one speaker ().
To physically edit signals by hand you can use any sound editor or probably as a best choice the Praat program ().