Next: Number of Speakers
Up: Corpus Specification
Previous: Corpus Specification
Contents
Speaker Profiles
A speech corpus consists of recordings of humans speaking. Therefore
the first things to specify are the characteristics and
distributions of these speakers. It is of great importance that the
speaker characteristics are documented as elaborately as possible.
Although these details may not seem interesting at the time the speakers
are recorded, their importance inevitably emerges later. In this case it is often difficult or
impossible to recollect the data. Moreover, a well documented speech corpus may
also be used for other research purposes, e.g. sociological research.
Useful descriptors and criteria are (in order of their importance):
- Distribution of sex; in most cases 50:50.
- Distribution of age; for example:
- Above 16 and under 50
- Equal distribution over the following bins: 12-22, 23-30, 31-40, 41-55
- Under 12
- Mother tongue; although most corpora imply native speakers of a
certain language, it is wise to mention it in the specs. It is also
recommended to specify the maximum percentage of non-native speakers, e.g.
Corpus language: German
Maximum percentage of non-native speakers: 5%
- Dialectal distribution. There might be the case that a corpus should
cover a certain distribution of a number of classified dialects of a
language. In general it is very difficult to control
the dialectal affiliation of speakers. Most speakers have a very
rigid preconception of what dialect (if any!) they are speaking. However,
even experts very often do not agree on certain dialectal features and it is
therefore very hard to
validate features like 10% of the corpus speakers are speaking
Bavarian. Here are some practical recommendations:
- Specify a recruitment by the factor
place of Elementary School instead of dialectal class.
In most cases
speakers will keep the dialect they acquired during the period of
elementary school. Since most dialectal maps to not match other more
familiar geographical areas, try to find a mapping from dialectal regions
to states, districts, cities etc. that speakers are familiar with.
State this procedure in the specs.
- Specify a post-recording classification of dialect. This requires
an expert in dialects and some time (more costs).
- Specify a recruitment using `local media' like local newspapers,
local radio stations etc.
- Education / Proficiency / Profession. Some speech corpora require certain
social factors like certain proficiencies (computer expert,
computer laymen), a
minimum level or a distribution of different levels of education (Elementary School, High School, College, University) or
even speakers of a certain profession (Radiologist, News
Announcer). Be sure that you only specify such characteristics, if
you are absolutely positive about the recruiting process.
Other possible factors may be: pathologies, foreign accents,
speech rate, uncooperative speakers (forensic) etc.
You may also specify here which meta data (see chapter
) about
speakers will added to the corpus.
Next: Number of Speakers
Up: Corpus Specification
Previous: Corpus Specification
Contents
BITS Projekt-Account
2004-06-01