next up previous contents
Next: Speech Corpus Production Up: Meta Data Previous: Example: SmartKom   Contents


Comments

The definition for `comments' in this context is all extra information which does not fit into the categories of recording protocol or speaker profiles. This means that comments often contain information about events / features / observations that were not anticipated by the designer of the corpus. As such they are in most cases very valuable; so there should be a place or procedure to capture comments of speakers / experimenter / labeler etc. in an ordered and safe fashion. Comments are not machine-readable like other meta data. Therefore it is debatable whether they belong to meta data at all. However, for practical reasons we list them in this chapter because it is very easy to insert a free text field entry into a recording protocol file or speaker profile. Likewise you may add such comment fields into labeler and transcription files.

Comments should be kept in their original version with original wording. Summaries are also possible, but it should be recognizable whether the comment on hand is a summary version or the original version. Beyond that it should be apparent whether the comments have been collected systematically (e.g. in form of a questionnaire) or coincidentally (e.g a subject expressed something about the recording without being asked explicitly). Often system errors have just been detected by speakers' comments.
Comments should be kept with the distributed speech corpus so that they are accessible by prospective users. It is a good idea to keep them in a form (e.g. plain text files) that might be searched for keywords.

Most common are comments about the speaker/speakers behavior:

How does the speaker approach the `virtual machine'?
Has the subject shown emotions?
What exactly was the gesture?
...
Other comments might stem from the experimenter, the labeler, the post-processing or even an external validation group.

Finally, all comments collected during corpus production may be a good source for the documentation of the speech corpus (see chapter [*]).


next up previous contents
Next: Speech Corpus Production Up: Meta Data Previous: Example: SmartKom   Contents
BITS Projekt-Account 2004-06-01