Next: Acknowledgments
Up: Introduction
Previous: ``Validation'' in this document
Contents
The following is a list of short
definitions for technical terms as used throughout this document:
- Speech Corpus = physical time signals, in most cases sound
pressure or
other measurable time signals, recorded from the act of
speaking1.5,
together with a minimal set of description
(annotations, meta data, ...) stored on a digital medium.1.6
- Validation = the (formal) check of a speech corpus with regard to
its pre-defined specifications following a documented or standardized
procedure and resulting in a validation report and/or a validation
quality grade.
- Evaluation = a qualitative assessment of a corpus with regard to
its usability in a certain task or development scenario or to its
market value.
- Specification = the fixed technical description of a speech corpus
with regards to all of its features (including annotations, meta data
and
documentation (see [5], chapter 4 for a detailed discussion of
specifications).
- Internal/inhouse validation = validation carried out by the
producer of a speech corpus during or after the production.
- External validation = validation carried out by an independent
validation institution that is not linked in a any way to the producer
of the speech corpus.
- (File) Format = Standardized or specified format of digital data.
Either signal data or symbolic data (annotations).
- Annotation = Discrete (categorical) description of a physical
signal (coding). Usually consisting of a closed set of symbols and a
scheme to link these symbols to either points in time or segments in
time.
- Domain = topic, or field of topics,
or the situation in which a verbal communication takes place.
- Prompt = A speech item (word, phrase or sentence) presented to
a speaker. Prompt list or prompt corpus is a collection of
prompts that define
the spoken content of the corpus.
- Spoken Content = What was spoken in a speech corpus.
- Meta Data = Data about data. In the context of this book the term
meta
data is restricted to three types: recording protocols, comments and
speaker profiles.
- Codes = categorized data entries, in contrast to free
text. If for instance the meta data parameter place of
birth is restricted to the German states and the category
`other', then it is a code. A free comment about a recording
success is no code and therefore not machine readable.
Next: Acknowledgments
Up: Introduction
Previous: ``Validation'' in this document
Contents
Angela Baumann
2004-06-03