- ...
BITS1.1
- www.bas.uni-muenchen.de/Forschung/BITS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...BAS1.2
- BAS = Bavarian Archive for Speech Signals.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
(BAS)1.3
- www.bas.uni-muenchen.de/Bas
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... documentation.1.4
- We
deem the evaluation of a SLR a process that can in most
cases be
carried out only with regard to a certain specific
application of the SLR. Therefore we
argue that it is very difficult, if not impossible, to evaluate a SLR
beforehand and for all thinkable future applications.
For example, the BAS catalogue contains scientific speech corpora that
were produced for certain very specific investigation in discourse
theory. Since these speech data were produced without any machine
readable annotations, an evaluation in the above sense carried out at
the time when the SLRs were added to the BAS would have undoubtedly resulted in
a very negative verdict: ``Not usable for any SLP applications!''
However, it turned out that with today's enhanced indexing techniques
these SLRs are very valuable because they contain spontaneous
language very close to what is used in normal speech communication. Therefore,
engineers now start using these data for their respective
applications in Human Computer Interfaces (HCI).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
speaking1.5
- Aside from the speech signal these time signals may
include: laryngographic signal, electropalatographic signal, coordinate
parameters derived from EMA (Electro Magnetic Articulography), X-ray
movie (cineradiography), coordinate
parameters derived from X-ray micro beam, air flow, nuclear magnetic
resonance imaging, ultrasound imaging etc. In this cook book we will not
give any specific instructions on how to use special recording hardware
for the listed signals, because this would be far beyond the scope of
this book.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... medium.1.6
- For
the remainder of this document we will use the term `corpus' instead of
`speech corpus'.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... points3.1
- Typically, these will be
the tolerance measures for found errors or deviations from the reference.
For instance: ``The allowed percentage of wrong word labels in the
transcript must be less than 2%.''
In most corpus specifications or documentations there are no numbers
concerning the reliability of annotations (SpeechDat being the praiseworthy
exception from this rule)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... documentation.3.2
- This scenario sounds very unlikely, but it is
not: this happened a few times with very old SLRs that were transfered to the
BAS.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
validation3.3
- At least it must be stated that they are
`unspecified' and can therefore be disregarded by the validation process.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
documentation.4.1
- Obviously there will be none, if you
produced the reference yourself based on the documentation!
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... sample4.3
- Added by the author; in some
cases the number of valid bits per sample, e.g. 12, does not fill up a
standard word (e.g. 2 bytes). It should then be documented which bits
are valid and what values may reside in the remaining invalid bits.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... prompting4.2
- Added
by the author.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...sox5.1
- SOundeXchange http://www.spies.com/Sox/
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
be5.2
- In this example the character & must not be contained in
the annotation files; in case it does, choose another character that
does not.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...-v5.3
- DOS-compatible text files are preferable, because UNIX
usually has no
trouble processing them.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... steps5.4
- Beware: a XML parser
using a DTD cannot check for correct label categories etc., because a DTD
describes only the syntax of a XML document, but is not powerful enough
for lexical analysis of semantics.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... `length'5.5
-
that is: the length of of the recorded speech signal vs. the total length
as reported in the corresponding annotation files, e.g. the last boundary
of the last segment.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... reference.6.1
- If there is no reference available or the
reference does not give specific rules for the canonical pronunciation,
check for consistency. For example, morphs that occur in
more than one word should always be transcribed in the same way.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... wrong6.2
- in case of meta data to be
validated
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... deviations.6.3
- Beware: The innocent term ``calculate deviations'' may hold a bunch
of systematic problems, especially with regards to segmental boundaries.
Please refer also to the remarks about the term `correctness' in chapter
3 (p. ).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
Praat6.4
- General phonetic tool developed by Paul Boersma at
the University of
Amsterdam, www.praat.org
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
intervals7.1
- which heavily depend on the number of samples
checked.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... scheme7.2
- List the selected files in the appendix.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...Documentation
8.1
- This list was compiled from [2], Chapter 1.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... sample8.3
- Added by the author; in some
cases the number of valid bits per sample, e.g. 12, does not fill up a
standard word (e.g. 2 bytes). It should then be documented which bits
are valid and what values may reside in the remaining invalid bits.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... prompting8.2
- Added
by the author.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... (BAS)9.1
- Contact Dr. Chr. Draxler,
draxler@bas.uni-muenchen.de, for more information regarding
WWWTranscribe.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... Currently9.2
- Oct 2002.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... distribution9.3
- See
www.bas.uni-muenchen.de/Forschung/BITS for updated information about the availability of WWWTranscribe.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...WebCommand12.1
- For the original corpus
specification and documentation of WebCommand see appendices
C and D.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
contract12.2
- see 3.4.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...WWWTranscribe12.3
- Contact Dr. Chr. Draxler,
draxler@bas.uni-muenchen.de, for more information regarding
WWWTranscribe.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.