



                     _/_/_/_/         _/_/         _/_/_/_/
                    _/      _/       _/ _/        _/      _/
                   _/      _/       _/  _/       _/
                  _/      _/       _/   _/       _/
                 _/_/_/_/         _/_/_/_/        _/_/_/
                _/      _/       _/     _/             _/
               _/      _/       _/      _/             _/
              _/      _/       _/       _/    _/      _/
             _/_/_/_/         _/        _/     _/_/_/_/


                   BAVARIAN ARCHIVE FOR SPEECH SIGNALS

               University of Munich, Institut of Phonetics
               Schellingstr. 3/II, 80799 Munich, Germany
                      bas@bas.uni-muenchen.de


      COPYRIGHT University of Munich 2008, 2009. All rights reserved.
    This corpus and software may not be disseminated further - not even
      partly - without a written permission of the copyright holders.


----------------------------------------------------------------------

        VERIDAT VERIF1DE (150 Speakers) Speech Database
                               DVD-R
                          
                          
                            Version 1.4
                     Copyright(C) October 2001 by
      T-Nova Deutsche Telekom Innovationsgesellschaft mbH
                              Berkom
                          Goslarer Ufer 35
                           D-10589 Berlin
All rights reserved. This corpus and software may not be disseminated 
further - not even partly - without a written permission of the 
copyright holders.

----------------------------------------------------------------------

Compiled by: Chr. Draxler
             Department of Phonetics and Speech Communication
             University of Munich
             Schellingstr. 3/II
             D 80799 Munich

             draxler@phonetik.uni-muenchen.de


The German VERIF1DE speaker verification database consists of 
150 * 20 calls stored on 1 DVD-R in the SpeechDat(II) 
database exchange format as defined in the SpeechDat deliverable 
SD131 (version 4.3).

             VERIF1DE_01: BLOCK00 - BLOCK29


IMPORTANT NOTES
---------------

1) The material contained on this DVD-R constitutes the VERIF1DE 
German Speaker Verification Database which is compatible with 
the SpeechDat-II specifications. 

DOC/DESIGN is a description of the VERIF1DE database as required
by the SpeechDat-II specifications, and the documents SD113, 
SD131, SD132, and SD133 are SpeechDat-II deliverables. 

The figures given in DOC/DESIGN, e.g. for phoneme counts, are those 
for the VERIF1DE database as it was validated. All other documentation
files, e.g. the dictionary, speaker and recording conditions tables, 
etc. are those of the VERIF1DE database.

The VERIF1DE database is a subset of the VERIDAT speaker 
verification database collected by T-Nova. VERIDAT contains
additional items and re-recordings of missing, corrupted, or
otherwise unusable files in VERIF1DE.

2) The filenames in VERIDAT and VERIF1DE are disjoint, but follow
the same specifications so that access to the data is uniform.
Hence splitting the databases into the two subdatabases is possible. 

The filenames in these two databases are different from the 
filenames under which the recordings were orignally stored 
during recording which consists of the speaker number, the 
session number, and the filename proper, e.g. 0001/01/C1F001A1.pld. 
In order to access the original filename, the first comment text 
in each label file contains the original filename in a normalized 
form, i.e. file system delimiters replaced by "_": 
0001_01_C1F001A1.pld

Furthermore, the script COPY.PL in conjunction with the database
table can be used to convert the SpeechDat filenames to the 
corresponding VERIDAT filenames.

3) All label files are stored in their correct location in
the same directory as the corresponding signal file. The label 
fields are also stored in the semicolon delimited text database 
file TABLE/VERIF1DE.TXT.


DVD-R Structure
---------------

/-- DISK.ID
/-- README.TXT
/-- COPYRIGH.TXT
/-- VERIF1DE -- +- DOC-----+-- DESIGN.{DOC | PDF | PS}
                |          +-- ISO88591.{PDF | PS}
                |          +-- SAMPALEX.{PDF | PS}
                |          +-- SAMPSTAT.TXT
                |          +-- SD113V33.DOC
                |          +-- SD131V43.DOC
                |          +-- SD132V24.DOC
                |          +-- SD133V19.DOC
                |          +-- SUMMARY.TXT
                |          +-- TRANSCRP.{PDF | PS}
                |          +-- VALREP.TXT
                |          +-- VALRESPO.TXT
                |
                +- INDEX---+-- CONTENTS.LST
                |          +-- C1xxxxDE.LST (with xxxx a speaker ID)
                |
                +- SOURCE--+-- COPY.PL
                |          +-- COPY.TBL
                |
                +- TABLE---+-- C1TRNDE.SES
                |          +-- C1TSTDE.SES
                |          +-- FAMREL.TBL
                |          +-- LEXICON.TBL
                |          +-- REC_COND.TBL
                |          +-- SESSION.TBL
                |          +-- SPEAKER.TBL
                |          +-- VERIF1DE.TXT
                |
                +- BLOCKyy-+                               (with yy=[00..30])
                           +-- SESyyzz --+                 (with zz=[00..99])
                                         + -- C1yyzzcc.DEA (signal file)
                                         + -- C1yyzzcc.DEO (SAM label file)
                                                           (cc = corpus code)

The BLOCK directories contain the actual recordings. Each call is written
to a SES directory, where the 4-digit number in the directory name
identifies the session uniquely. The signal and label files are held in the
session directory; for each signal file (extension .DEA) there is the
corresponding SAM label file (extension .DEO). The corpus code (cc) 
defines the comntent of the recording as defined in chapter 3 of 
DOC/DESIGN. A sample prompt sheet is given in DOC/DESIGN.


Note: file name extension mappings:

.DEA     8 KHz 8 bit alaw encoded raw signal file
.DEO     ISO 8859-1 encoded SAM label file
.DOC     Microsoft Word
.LST     tab-delimited DOS-formatted ISO-8859-1 index file
.PDF     Adobe Portable Document Format
.PS      Adobe PostScript
.TXT     DOS-formatted ISO 8859-1
.TBL     tab-delimited ISO 8859-1 table file
.SES     DOS-formatted text file

         
The following directories contain documentation and related information:

DOC    : DESIGN.{DOC|PDF|PS} Contents description of the German FDB4000
         ISO88591.{PDF|PS}   ISO8859-1 (ISO Latin) code table
         SAMPALEX.{PDF|PS}   German SAM-PA table
         SAMPSTAT.TXT        SNR values
         SD113V33.DOC        Definition of Speaker Verification DB Contents
         SD131V43.DOC        Database Exchange Format Specification
                             (including definition of SAM mnemonics)
         SD132V24.DOC        Orthographic and Transcription Conventions
         SD133V19.DOC        Validation criteria
         SUMMARY.TXT         German VERIF1DE summary file
		 TRANSCRP.{PDF|PS}   the validation and transcription handbook
		 VALREP.TXT          validation report by SPEX
		 VALRESPO.TXT        response to validation report


         SAM mnenomics used in the following (e.g. 'VOL') are defined 
         in DOC/SD131V43, appendices C and D.

INDEX  : CONTENTS.LST contents of the database
						
                      The order of fields in the table is
						
                      VOL DIR SRC CCD CRP SCD SEX AGE ACC LBO
						
                      and the fields are separated by tabs.

         C1xxxxDE.LST speaker file list that contains all
                      files pertaining to a given speaker; the
                      xxxx are valid 4-digit speaker IDs, the
                      field is the full path to the file


SOURCE : COPY.PL      A perl script to convert filenames from 
                      the SpeechDat nomenclature to the VERIDAT
                      filenames used during the recordings
         
         COPY.TBL     A table to map the SpeechDat-compatible
                      filenames to the VERIDAT filenames used
                      during the recordings
      
		 		 
TABLE  : contains the following DOS-formatted ISO 8859-1 files

         C1TRNDE.SES  a single-column table containing all
                      sessions to be used for training
         
         C1TSTDE.SES  a single-column table containing all
                      sessions to be used for testing
                      
         FAMREL.TBL   a table containing family relationship
                      information for speakers in tab-delimited
                      fields
                      
                      pair_no relationship kind_of_relation SCD

         LEXICON.TBL  the lexicon file with the following 
                      tab-delimited fields

                      ORTHOGRAPHY FREQUENCY SAM-PRONUNCIATION	

         REC_COND.TBL the recording conditions information file with
                      the following tab separated fields
                      
                      RCC REG ENV PHM NET SNL
         SESSION.TBL  the session information file with the following
                      tab separated fields

                      SES RED RET SCD HLT TRD STR REG ENV NET PHM SNL

         SPEAKER.TBL  the speaker information file with the following
                      tab separated fields

                      SES SEX AGE ACC

         VERIF1DE.TXT text file containing all SAM label files
                      in a single row each; the label fields are
                      separated by ";" and the label sequence is
                      
                      LHD DBN SES CMT1 VOL DIR SRC CCD CRP REP 
                      RED RET BEG END CMT2 SAM SNB SBF SSB QNT 
                      CMT3 SCD SEX AGE ACC HLT TRD STR SNL CMT4 
                      RCC REG ENV NET PHM LBD CMT5 LBR LBO ELF

         SAM mnenomics are defined in DOC/SD131V43, appendices C and D.

HISTORY
-------

Version 1.2	: Deliverable to T-Nova Berlin, 2001
Version 1.3	: BAS Edition 2011
Version 1.4     : BAS CLARIN Repository : free access for 
                  European academics 2014
