Next: About this document ...
Up: The Validation of Speech
Previous: WebCommand - Main Documentation
Contents
WebCommand - Validation Report
Summary
The speech corpus WebCommand has been validated against the specified
checks as given in the validation contract (see annex) as well as against
general principles of good practice. The validation covered completeness,
formal checks and manual checks of selected subsamples. The overall
quality of the corpus is good and there should be no problem in using the
corpus for the intended and other applications. Some flaws in the corpus
documentation may be corrected without much effort.
Introduction
This document summarizes the results of an inhouse validation of the
speech corpus WebCommand12.1.
WebCommand was produced by the Bavarian
Archive for Speech Signals (BAS) in the year 2002 as a contractor to
Siemens AG, Munich. The aim of the corpus was to record
application-specific commands in British English
and French by native speakers in a
quiet office environment. The aimed application is the control of
a so called WebPad (a laptop without keyboard) used for surfing the
internet and some other proprietary services. The spoken texts were
prompted on screen and recorded with two different microphones and in two
different rooms. The
data were transcribed using SpeechDat conventions. Also a canonical
pronunciation dictionary with all spoken words was included in the
corpus.
Validation Results
The following list contains all validation steps as specified in the validation
contract12.2
together with the methodology and the results.
- Completeness, file naming, readability
- Signal files
The corpus is divided into complete and incomplete recording sessions.
The complete part contains more than the required number of recording
sessions and meets the minimum numbers per language (20) and per gender
(20). Each session dir of the complete part contains exactly 130
recording files (WAV stereo) as stated in the specs. The file names of
the signals files meet the documented specs.
- Meta data files
The recording conditions are summarized for all recording sessions in
the file SESSION.TBL; a SpeechDat-compatible version of this table is
stored in the file SUMMARY.TXT, which also contains markers for each
individual recording item. Both files contain consistent data and
the data is compatible with found sessions dirs. Speaker profiles are
stored in the file SPEAKER.TBL which covers all speakers of the corpus.
- Annotation files
All annotation files are stored on a separate CDROM and in
SpeechDat-compatible SAM format. Every signal file has a corresponding SAM label file.
The file naming is consistent with the file naming of signal files.
All checked files were readable.
Status completeness: ok.
- Superfluous files
No superfluous files were found in the corpus.
Status superfluous files: ok
- Signal files
All signal files were checked for their format using the command `sox -V'
and then parsing the output produced by sox. All signal files are valid
WAV sound files (RIFF) with the following properties in accordance with the
documentation as well to the specification: 2 channels, 22050 Hz sampling
rate, 16 bit width, signed (linear). All signal files contain a signal of
more than 5 sec length. About 4% of the sound files contain saturated
samples (clippings); some of these were inspected manually to ensure that
the clipping were caused by noise, clicks etc. but not by
the speech signal itself. In the inspected files this was never the case.
Sox did not report any technically corrupt files.
Status signal files: ok
- Speaker distribution
10 female and 10 male speakers as stated in the meta data were selected
randomly and their speech signal checked for their respective gender. No
deviations from the documented gender were found.
Status speaker distribution: ok
- Documentation, completeness, consistency with corpus
Apart from the file TRANSCRP_EN.PDF which describes the SpeechDat
annotation
the documentation of the corpus consists of plain text files only. All
documentation (and meta data) files are readable on Macintosh, Linux and
Windows.
The main documentation is contained in the file REPORT.txt. The following
checks have been performed:
- Contact for requests regarding the corpus: ok
- Number and type of media: ok
- Content of each medium: acceptable
``The corpus contains 47 complete sessions...'' - The corpus
contains 95 complete sessions. What is meant here is probably: ``The
corpus contains 47 double sessions recorded in the two recording rooms.''
- Copyright statement and intellectual property rights (IPR): ok
- Layout of media: file system type and directory structure: ok
- File nomenclature: explanation of codes used: ok
``The channel assignment for the microphones is stored in the file
TABLE/SESSION.TBL.'' - A constant channel assignment would be
preferable; also it is generally better to separate different signals in
individual files and mark them in the file name.
- Formats of signal and annotation files: ok
- Coding: PCM linear ok
- Compression: n.a.
- Sampling rate: 22050Hz ok
- Valid bits per sample: 16 ok
- Used bytes per sample 2 ok
- Multiplexed signals: standard RIFF ok
- Clearly stated purpose of the recordings: ok
- Speech type(s): read from screen ok
- Instruction to speakers: acceptable
A full copy of the instructions is not provided (verbal instruction), but
the recording situation makes quite clear how the speakers were
instructed.
- Specification of the individual text items: ok
- Specification for the prompt sheet design: n.a.
- Example prompt sheet: n.a.
- Speaker recruitment strategies: not given
- Number of speakers: ok
- Distribution of speakers over sex, age, dialect regions:
acceptable
Only age, mother tongue and gender is given in the speaker profile. Due
to the nature of the corpus and the fact that the specifications do not
require any additional information, this is acceptable.
- Description/definition of dialect regions: not given
- Recording platform: Macintosh ok
- Position and type of microphone(s): ok
- Company name and type id: ok
- Electret, dynamic, condenser: not given
Has to be derived from technical sheets of microphones, which are not
provided in the documentation.
- Directional properties: see before
- Mounting: ok
- Position of speaker(s) (distance to microphone): ok
- Bandwidth: half of sampling rate ok
- Number of channels and channel separation: ok
- Acoustical environment: ok
- Unambiguous spelling standard used in annotations: not given
Since the prompt texts were provided by the client, the spelling is
probably taken as is.
- Labeling symbols: ok
- List of non-standard spellings (dialectal variation, names etc.):
not given
- Distinction of homographs which are no homophones: not given
- Character set used in annotations: plain text ISO 8859-1 ok
- Annotation manual, guidelines, instructions: ok
- Description of quality assurance procedures: not given
- Selection of annotators: not given
- Training of annotators: not given
- Annotation tools used: WWWTranscribe ok
- Lexicon format: ok
- Lexicon text-to-phoneme procedure: not given
- Lexicon explanation or reference to the phoneme set: SAM-PA ok
- Lexicon phonological or higher order phenomena accounted for in the
phonemic transcriptions: n.a.
- Statistical Information: not given
- Indication of how many files were double-checked by the producer
together with percentage of detected errors: not given
All documentation files are readable on WinX, Linux and Macintosh.
Status documentation: acceptable
- Annotation files (transcripts)
- All annotation files have been check for proper SAM syntax: ok
- 10% percent randomly selected annotation files were inspected
manually against the signal using WWWTranscribe. Less than 1% text errors
have been found and less than 2% of noise marker errors (listing in annex A).
Status annotation: ok
- Lexicon
- Formal check
The two SAM-PA lexica have been checked for their format, used SAM-PA
symbols and coverage of transcripts. No missing items or errors were
found.
- Content
15% of randomly selected lexical entries were checked manually against
SAM-PA rules. Less than 2% percent phoneme deviation found.
Status lexicon: ok
- Readability on different platforms
The two DVDs and the CD containing the documentation were successfully
mounted on Macintosh, Linux and WinX.
Status Readability on different platforms: ok
Validation Tools
Sox was used to check the format of the signal files as well as for
clippings.
WWWTranscribe12.3 was used to manually check the transcripts and the
lexicon.
Other Observations
None.
Comments
The documentation lacks some details, which should be provided by the
producer:
- how speakers have been recruited
- which reference was taken for the English and French spelling
- according to which method the pronunciations in the lexica were
created
- the selection and training of the transcribers
- quality assurance procedures
- type of microphones
- description of speaker instruction
Result
The corpus WebCommand is in a usable status.
Next: About this document ...
Up: The Validation of Speech
Previous: WebCommand - Main Documentation
Contents
Angela Baumann
2004-06-03