_/_/_/_/         _/_/         _/_/_/_/
                    _/      _/       _/ _/        _/      _/
                   _/      _/       _/  _/       _/
                  _/      _/       _/   _/       _/
                 _/_/_/_/         _/_/_/_/        _/_/_/
                _/      _/       _/     _/             _/
               _/      _/       _/      _/             _/
              _/      _/       _/       _/    _/      _/
             _/_/_/_/         _/        _/     _/_/_/_/


                   BAVARIAN ARCHIVE FOR SPEECH SIGNALS

               University of Munich, Institute of Phonetics
               Schellingstr. 3/II, 80799 Munich, Germany
                      bas@phonetik.uni-muenchen.de


                         BITS LOGATOME CORPUS
                           DVD-ROM Database
                          
                          
                         Copyright(C) 2005 by
                 Bavarian Archive for Speech Signals
                     University of Munich, Germany


Version 1.5
Short Name: BAS BITS-LG
Corpus Date: 2006/04/27
Modification Date: 2010/01/29

The BITS synthesis corpus consists of two parts: a set of logatome
recordings for controlled diphone synthesis and a set of sentence
recordings for unit selection techniques. BITS stands for "BAS
Infrastructures for Technical Speech Processing" and was funded by
the German Ministry of Science and Education during 2003-2005.  

This README file reports on the BITS Logatome Corpus which consists of
11036 recordings stored on 4 DVDs. Each DVD contains the recordings, the
annotation files and the meta data files of one of the four professional
speakers, and the entire corpus' documentation.

Note that all documentation files are coded in Unicode UTF-8 if not
stated otherwise.

Table of contents
-----------------
1.) Introduction
2.) Speakers
2.1.) Recruitment of Speakers
2.2.) Speakers Profile
3.) Recording
4.) Recording Procedure
5.) Annotation
6.) File Nomenclatura
7.) Structure of each DVD
8.) Other Documentation Files
9.) Contact
10.) History


Introduction
------------

The BITS Logatome Corpus consists of a set of logatome recordings for
controlled diphone synthesis.

Speech synthesis using concatenative techniques is maturing to a point
where standard procedures are being implemented in a variety of products.
However, because of the considerable costs most small and medium-sized
companies as well as university labs cannot afford to produce the required
speech resources on their own. Although there are some public domain German
diphone voices available for research purposes (e.g. MBROLA) there is
definitely a lack of publicly available synthesis resources.
The BITS synthesis corpus (recorded and) produced by BAS fills the obvious
gap. The work was funded by the German Ministry of Education and Science
(grant no 01 IV B01).


Speakers
--------

Recruitment of speakers:
------------------------

45 speakers were invited for a casting. They were asked to read 90
logatomes that contained a subset of our diphone set so that three target
sentences of nearly all German phonemes could be synthesised.  Based on a
ranking according to naturaless and pleasantness 10 speakers were selected
as nominees. After an overall evaluation - by specialists in speech
synthesis and by the BITS group - the best four speakers (two male and two
female) were chosen for the final recordings. More informations about the
recruitment of the speakers can be found under:
/DOC/HTML/Specification_logatome_corpus.pdf

Speakers Profile:
-----------------

Four professional speakers were recorded, between the age of 40 to 45.
All speakers were of German nationality and had at least foreign language
competence in English.  More informations about the speakers can be found
in the table /DOC/SRPK.TBL.
SRPK.TBL contains a list that gives information about the speakers.

The ordered list has 10 columns (seperated by tabs):
ID			: speaker id (SES200[1-4])
Sex			: M = male, W = female
Age			: age of the speakers at the time of the recordings
Name			: full name
Nationality		: the nationality of the speaker
Size			: size in cm
Weight			: weight in kg
ACC			: the accent of the speaker is determined through the
			  federal state the speaker entered the school
Edu			: Education of the speaker
PoL			: current place of living
Prof			: current occupation
FL			: foreign languages
			  ENG - English
			  FR  - French
			  I   -	Italian
			  EL  - Greek
Smk			: smoker (y=yes, n=no, cas=casually)			

Recording:
----------

The speech signal was recorded in three channels (headset-microphone,
room-microphone and laryngograph signal). The sampling rate is 48kHz, 
with 16 bit quantization. All signals are recorded via a Yamaha 02R digital
sound mixer directly to hard using the multi-channel recording software
SpeechRecorder.


 - Channel 0 : close talk microphone (Beyerdynamic NEM 192) positioned 7cm to
 the right of the mid-sagital plane at the height of the upper lip.
 - Channel 1 : laryngograph signal (LaryngoGraph PCLX)
 - Channel 2 : large membrane condenser microphone (Neumann Type TLM 103) 60cm
 from the mouth.

Channels were separated into standard WAV format files; no further processing
was performed to avoid any undesired degradations of the signals.
 

Recording Procedure:
--------------------

The speaker was seated in an insulated room with low reverberation.  The
positions of the chair and room microphone were marked on the floor.
Before the recordings the speaker was asked to put on the headset
microphone and the laryngograph electrodes.  During the session the speech
prompts are displayed through a window using the program "SpeechRecorder".
Three supervisors monitored the recording and a prompt was repeated until
all three supervisors agreed about its quality.
More informations about the recording procedure can be found under:
/DOC/HTML/Preparation_and_Execution_of_Recordings_8_2.pdf


Annotation:
-----------

All logatoms were inspected manually using the software praat and
the two phonems building the target diphone were segmented phonetically.
Results of this procedure are stored in the directory ANNOT/SES####
(#### = speaker number) in three different file formats:
- BAS Partitur Format (*.par) with the following tiers:
    ORT : orthographic representation of the prompted logatom
    KAN : canonical pronunciation (SAM-PA) of logatom
    SAP : segmentation of phonemes : pre-ceeding and tailing parts 
          of the logation are segmented into '<usb>'; in between there
	  are 2-4 segments describing the two phonemes
	  (more than two segments are required of a plosive is involved
	  since plosives are segmented into closure and burst phase 
	  separatly; see details below)
- Annotation Graphs (XML, *.ags)
    This is basically the same information in XML form. See the ag.dtd 
    and metadata.dtd in directory DOC as well as 
    http://agtk.sourceforge.net/ for details.
- TextGrid (praat, *.TextGrid)
    The original segmentation results in the praat format.

Short description of segmentation procedure    

Please note that the segmentation described here refers to phoneme
boundaries. Not the classic diphone as being used in speech
synthesis is segmented but rather the two phonemes building the diphone.
Users of the corpus have to apply their own specific cutting technique to
derive the final diphone segment from this segmentation (for instance
by using the classic 40/60 rule).

For the phonetic annotation all logatomes were segmented in a first pass
with MAUS into German SAM-PA. (More about the SAM-PA encoding used for the
annotation under /DOC/HTML/Conventions_for_segmentation_8_5e.pdf)

The logatomes were then pre-segmented according to their canonical form
using MAUS.  This guaranteed that the logatome contained the diphone in
correct SAM-PA.  There were automatically presented only three boundaries
to the segmenter:
- beginning of the diphone
- border between the two phonemes
- end of diphone.

In a second pass a group of ten to twelve trained phoneticians manually 
corrected the pre-segmented sentences and logatomes.
After that three phoneticians that were consistent to each other corrected the
segmentations in a third pass. 
In a last step all segmentations were reviewed by the team supervisor.

The following rules of segmentation were used:
- the placing of boundaries is primarily based on the auditory judgement.
- the boundaries of segments are always placed at  positive zero-crossings
  of the oscillogram (only in SAP TextGrid tiers!).
- the placement of the boundaries should be controlled by sonagram and
  oscillogram.
- within transitions in which both of two adjacent phonemes can be heard, the
  boundary is placed in the middle of this transition (50% rule).
- voiced (periodic) elements start with the first clearly identifiable
  glottal pulse.
- the boundaries of segments with low intensity (e.g. /h/, aspiration) are
  set where the signal can be clearly distinguished from the background
  noise.
Noises of breathing - if clearly recognised - have to be cut off from the
friction or aspiration.

Special labels aside from standard German SAM-PA:
Q : glottal stop (SAM-PA: ?)
~ : preceeding vowel is nasalized
§ : preceeding phoneme contains an audible lip smack
q : preceeding vowel was glottalized

More informations about the annotation of the corpus can be found under:
/DOC/HTML/Conventions_for_segmentation_8_5e.pdf


File Nomenclatura:
------------------

The names of both audio and annotation files consist of the following:

LG####%%%%_$   with   #### : speaker id 2001 - 2004
                      %%%% : logatom id (see table /DOC/BITS-LG.TBL)
		      $    : channel  0 - 2

File name extension mappings

.TextGrid	Praat Label file with interval tiers
.wav		Audio file
.txt		Text file
.html		HTML file
.par            BAS Partitur Format file


Structure of each DVD
---------------------

Each DVD contains the following:

README		: this file
DATA/		: the recordings of one of the four speakers
ANNOT/		: the annotations files of one of the four speakers
DOC/		: the documentation files (start with DOC.HTML)
PLAY/           : simple concatenation script (see README there)


The DOC/ directory contains the following:

README.PAR      : brief documentation of the BAS Partitur Format (BPF)
SPRK.TBL        : speaker profiles (see before)
KNOWN-ERRORS    : list of known errors that cannot be fixed

BITS-LG.TBL     : logatome list (see below)

DOC.HTML        : start of main documentation
HTML/           : main documentation files

PUBLICATIONS/   : publications


Other Doumentation Files
------------------------

KNOWN-ERRORS - known errors that cannot be fixed


BITS-LG.TBL - logatome list 

This file contains a 5-column table describing the spoken logatoms
of the corpus:

<ID> <LOGATOM> <TRANSCRIPT> <DIPHONE> <LANG>

where:  <ID>          : logatom id 0001-2795
        <LOGATOM>     : logatome prompt during recording
        <TRANSCRIPT>  : SAM-PA transcript of the complete logatome
        <LANG>        : non-German language phoneme contained
                        ENG    : one English phoneme
                        FR     : one French phoneme
                        ENG/FR : diphone English-French
                        FR/ENG : diphone French-English


Contact
-------

For questions, remarks, bug reports etc. please contact
Florian Schiel          schiel@bas-services.de
                        +49-89-2180-5751

History
-------

15.03.06 : Version 1.0
27.04.06 : Version 1.1 : Documentation re-worked
                         BAS Partitur Format files added
04.08.06 : Version 1.2 : Several minor bugs fixed, AGS files added to 
                         annotation
09.08.06 : Version 1.3 : BPF tiers ORT and SAP contained 8-bit ASCII 
                         which is not conform to BPF. Replaced by 
			 LaTeX codes.
25.08.06 : Version 1.4 : Bug fix in AGS file creation, all *.ags files anew			 
29.01.10 : Version 1.5 : Bug fix in annotation files LG20031321_0.par and
                         LG20031321_0.TextGrid : the burst segment of the
                         glottal stop /Q_b/ was missing -> fixed