... BITS1.1
www.bas.uni-muenchen.de/Forschung/BITS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...BAS1.2
BAS = Bavarian Archive for Speech Signals.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... (BAS)1.3
www.bas.uni-muenchen.de/Bas
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... documentation.1.4
We deem the evaluation of a SLR a process that can in most cases be carried out only with regard to a certain specific application of the SLR. Therefore we argue that it is very difficult, if not impossible, to evaluate a SLR beforehand and for all thinkable future applications.

For example, the BAS catalogue contains scientific speech corpora that were produced for certain very specific investigation in discourse theory. Since these speech data were produced without any machine readable annotations, an evaluation in the above sense carried out at the time when the SLRs were added to the BAS would have undoubtedly resulted in a very negative verdict: ``Not usable for any SLP applications!''

However, it turned out that with today's enhanced indexing techniques these SLRs are very valuable because they contain spontaneous language very close to what is used in normal speech communication. Therefore, engineers now start using these data for their respective applications in Human Computer Interfaces (HCI).

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... speaking1.5
Aside from the speech signal these time signals may include: laryngographic signal, electropalatographic signal, coordinate parameters derived from EMA (Electro Magnetic Articulography), X-ray movie (cineradiography), coordinate parameters derived from X-ray micro beam, air flow, nuclear magnetic resonance imaging, ultrasound imaging etc. In this cook book we will not give any specific instructions on how to use special recording hardware for the listed signals, because this would be far beyond the scope of this book.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... medium.1.6
For the remainder of this document we will use the term `corpus' instead of `speech corpus'.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... points3.1
Typically, these will be the tolerance measures for found errors or deviations from the reference. For instance: ``The allowed percentage of wrong word labels in the transcript must be less than 2%.'' In most corpus specifications or documentations there are no numbers concerning the reliability of annotations (SpeechDat being the praiseworthy exception from this rule)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... documentation.3.2
This scenario sounds very unlikely, but it is not: this happened a few times with very old SLRs that were transfered to the BAS.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... validation3.3
At least it must be stated that they are `unspecified' and can therefore be disregarded by the validation process.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... documentation.4.1
Obviously there will be none, if you produced the reference yourself based on the documentation!
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... sample4.3
Added by the author; in some cases the number of valid bits per sample, e.g. 12, does not fill up a standard word (e.g. 2 bytes). It should then be documented which bits are valid and what values may reside in the remaining invalid bits.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... prompting4.2
Added by the author.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...sox5.1
SOundeXchange http://www.spies.com/Sox/
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... be5.2
In this example the character & must not be contained in the annotation files; in case it does, choose another character that does not.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...-v5.3
DOS-compatible text files are preferable, because UNIX usually has no trouble processing them.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... steps5.4
Beware: a XML parser using a DTD cannot check for correct label categories etc., because a DTD describes only the syntax of a XML document, but is not powerful enough for lexical analysis of semantics.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... `length'5.5
that is: the length of of the recorded speech signal vs. the total length as reported in the corresponding annotation files, e.g. the last boundary of the last segment.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... reference.6.1
If there is no reference available or the reference does not give specific rules for the canonical pronunciation, check for consistency. For example, morphs that occur in more than one word should always be transcribed in the same way.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... wrong6.2
in case of meta data to be validated
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... deviations.6.3
Beware: The innocent term ``calculate deviations'' may hold a bunch of systematic problems, especially with regards to segmental boundaries. Please refer also to the remarks about the term `correctness' in chapter 3 (p. [*]).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... Praat6.4
General phonetic tool developed by Paul Boersma at the University of Amsterdam, www.praat.org
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... intervals7.1
which heavily depend on the number of samples checked.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... scheme7.2
List the selected files in the appendix.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...Documentation 8.1
This list was compiled from [2], Chapter 1.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... sample8.3
Added by the author; in some cases the number of valid bits per sample, e.g. 12, does not fill up a standard word (e.g. 2 bytes). It should then be documented which bits are valid and what values may reside in the remaining invalid bits.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... prompting8.2
Added by the author.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... (BAS)9.1
Contact Dr. Chr. Draxler, draxler@bas.uni-muenchen.de, for more information regarding WWWTranscribe.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... Currently9.2
Oct 2002.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... distribution9.3
See www.bas.uni-muenchen.de/Forschung/BITS for updated information about the availability of WWWTranscribe.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...WebCommand12.1
For the original corpus specification and documentation of WebCommand see appendices C and D.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... contract12.2
see 3.4.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...WWWTranscribe12.3
Contact Dr. Chr. Draxler, draxler@bas.uni-muenchen.de, for more information regarding WWWTranscribe.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.