The documentation of the Recording and the Post-processing is basically a repetition of the corresponding part in the corpus specifications with the slight but important difference that here the real recording conditions should be described. If there exists a Log File of the production, it should be included here. If possible include pictures from the recording setup and recording sites. Draw diagrams to illustrate the exact positions of speakers and microphones.
The Annotation should be documented for each of the used annotation layers in great detail. Not only the mere contents and file formats should be given but also the exact procedures on how the annotations were produced. For manual annotations there must be a copy of the annotation guide lines included here. Education and training of the labelers should be indicated, tools and their usage described.
If you use any automatic procedures, insert a copy of the source code of your scripts or programs here or give proper reference to public domain software and describe exactly how it was used. Describe the methods of quality control that were applied to the annotations; define the character set that is used in the annotation files as well as tag sets, phonetic alphabets etc.
If you are using XML in the annotation files, give pointers to the corresponding DTDs.
The documentation of the Meta Data should contain a precise
definition of each entry in the meta data files. Give complete
lists of the codes you
are using and comment on how the data were gathered. For
instance, if an entry in the speaker profile files describes the dialectal
variety of a language by naming the state or province of a speaker, you
should mention here how this information was obtained: was it from an
interview with the speaker (self-assessment), was is by asking for the
place of elementary school or was it from a judgment of one or a group
of experts about dialects of that language.
If you are using XML in the meta data files, give
pointers to the corresponding DTDs.