_/_/_/_/ _/_/ _/_/_/_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/_/_/_/ _/_/_/_/ _/_/_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/_/_/_/ _/ _/ _/_/_/_/ BAVARIAN ARCHIVE FOR SPEECH SIGNALS University of Munich, Institut of Phonetics Schellingstr. 3/II, 80799 Munich, Germany bas@bas.uni-muenchen.de COPYRIGHT University of Munich 2008, 2009. All rights reserved. This corpus and software may not be disseminated further - not even partly - without a written permission of the copyright holders. ---------------------------------------------------------------------- VERIDAT VERIF1DE (150 Speakers) Speech Database DVD-R Version 1.4 Copyright(C) October 2001 by T-Nova Deutsche Telekom Innovationsgesellschaft mbH Berkom Goslarer Ufer 35 D-10589 Berlin All rights reserved. This corpus and software may not be disseminated further - not even partly - without a written permission of the copyright holders. ---------------------------------------------------------------------- Compiled by: Chr. Draxler Department of Phonetics and Speech Communication University of Munich Schellingstr. 3/II D 80799 Munich draxler@phonetik.uni-muenchen.de The German VERIF1DE speaker verification database consists of 150 * 20 calls stored on 1 DVD-R in the SpeechDat(II) database exchange format as defined in the SpeechDat deliverable SD131 (version 4.3). VERIF1DE_01: BLOCK00 - BLOCK29 IMPORTANT NOTES --------------- 1) The material contained on this DVD-R constitutes the VERIF1DE German Speaker Verification Database which is compatible with the SpeechDat-II specifications. DOC/DESIGN is a description of the VERIF1DE database as required by the SpeechDat-II specifications, and the documents SD113, SD131, SD132, and SD133 are SpeechDat-II deliverables. The figures given in DOC/DESIGN, e.g. for phoneme counts, are those for the VERIF1DE database as it was validated. All other documentation files, e.g. the dictionary, speaker and recording conditions tables, etc. are those of the VERIF1DE database. The VERIF1DE database is a subset of the VERIDAT speaker verification database collected by T-Nova. VERIDAT contains additional items and re-recordings of missing, corrupted, or otherwise unusable files in VERIF1DE. 2) The filenames in VERIDAT and VERIF1DE are disjoint, but follow the same specifications so that access to the data is uniform. Hence splitting the databases into the two subdatabases is possible. The filenames in these two databases are different from the filenames under which the recordings were orignally stored during recording which consists of the speaker number, the session number, and the filename proper, e.g. 0001/01/C1F001A1.pld. In order to access the original filename, the first comment text in each label file contains the original filename in a normalized form, i.e. file system delimiters replaced by "_": 0001_01_C1F001A1.pld Furthermore, the script COPY.PL in conjunction with the database table can be used to convert the SpeechDat filenames to the corresponding VERIDAT filenames. 3) All label files are stored in their correct location in the same directory as the corresponding signal file. The label fields are also stored in the semicolon delimited text database file TABLE/VERIF1DE.TXT. DVD-R Structure --------------- /-- DISK.ID /-- README.TXT /-- COPYRIGH.TXT /-- VERIF1DE -- +- DOC-----+-- DESIGN.{DOC | PDF | PS} | +-- ISO88591.{PDF | PS} | +-- SAMPALEX.{PDF | PS} | +-- SAMPSTAT.TXT | +-- SD113V33.DOC | +-- SD131V43.DOC | +-- SD132V24.DOC | +-- SD133V19.DOC | +-- SUMMARY.TXT | +-- TRANSCRP.{PDF | PS} | +-- VALREP.TXT | +-- VALRESPO.TXT | +- INDEX---+-- CONTENTS.LST | +-- C1xxxxDE.LST (with xxxx a speaker ID) | +- SOURCE--+-- COPY.PL | +-- COPY.TBL | +- TABLE---+-- C1TRNDE.SES | +-- C1TSTDE.SES | +-- FAMREL.TBL | +-- LEXICON.TBL | +-- REC_COND.TBL | +-- SESSION.TBL | +-- SPEAKER.TBL | +-- VERIF1DE.TXT | +- BLOCKyy-+ (with yy=[00..30]) +-- SESyyzz --+ (with zz=[00..99]) + -- C1yyzzcc.DEA (signal file) + -- C1yyzzcc.DEO (SAM label file) (cc = corpus code) The BLOCK directories contain the actual recordings. Each call is written to a SES directory, where the 4-digit number in the directory name identifies the session uniquely. The signal and label files are held in the session directory; for each signal file (extension .DEA) there is the corresponding SAM label file (extension .DEO). The corpus code (cc) defines the comntent of the recording as defined in chapter 3 of DOC/DESIGN. A sample prompt sheet is given in DOC/DESIGN. Note: file name extension mappings: .DEA 8 KHz 8 bit alaw encoded raw signal file .DEO ISO 8859-1 encoded SAM label file .DOC Microsoft Word .LST tab-delimited DOS-formatted ISO-8859-1 index file .PDF Adobe Portable Document Format .PS Adobe PostScript .TXT DOS-formatted ISO 8859-1 .TBL tab-delimited ISO 8859-1 table file .SES DOS-formatted text file The following directories contain documentation and related information: DOC : DESIGN.{DOC|PDF|PS} Contents description of the German FDB4000 ISO88591.{PDF|PS} ISO8859-1 (ISO Latin) code table SAMPALEX.{PDF|PS} German SAM-PA table SAMPSTAT.TXT SNR values SD113V33.DOC Definition of Speaker Verification DB Contents SD131V43.DOC Database Exchange Format Specification (including definition of SAM mnemonics) SD132V24.DOC Orthographic and Transcription Conventions SD133V19.DOC Validation criteria SUMMARY.TXT German VERIF1DE summary file TRANSCRP.{PDF|PS} the validation and transcription handbook VALREP.TXT validation report by SPEX VALRESPO.TXT response to validation report SAM mnenomics used in the following (e.g. 'VOL') are defined in DOC/SD131V43, appendices C and D. INDEX : CONTENTS.LST contents of the database The order of fields in the table is VOL DIR SRC CCD CRP SCD SEX AGE ACC LBO and the fields are separated by tabs. C1xxxxDE.LST speaker file list that contains all files pertaining to a given speaker; the xxxx are valid 4-digit speaker IDs, the field is the full path to the file SOURCE : COPY.PL A perl script to convert filenames from the SpeechDat nomenclature to the VERIDAT filenames used during the recordings COPY.TBL A table to map the SpeechDat-compatible filenames to the VERIDAT filenames used during the recordings TABLE : contains the following DOS-formatted ISO 8859-1 files C1TRNDE.SES a single-column table containing all sessions to be used for training C1TSTDE.SES a single-column table containing all sessions to be used for testing FAMREL.TBL a table containing family relationship information for speakers in tab-delimited fields pair_no relationship kind_of_relation SCD LEXICON.TBL the lexicon file with the following tab-delimited fields ORTHOGRAPHY FREQUENCY SAM-PRONUNCIATION REC_COND.TBL the recording conditions information file with the following tab separated fields RCC REG ENV PHM NET SNL SESSION.TBL the session information file with the following tab separated fields SES RED RET SCD HLT TRD STR REG ENV NET PHM SNL SPEAKER.TBL the speaker information file with the following tab separated fields SES SEX AGE ACC VERIF1DE.TXT text file containing all SAM label files in a single row each; the label fields are separated by ";" and the label sequence is LHD DBN SES CMT1 VOL DIR SRC CCD CRP REP RED RET BEG END CMT2 SAM SNB SBF SSB QNT CMT3 SCD SEX AGE ACC HLT TRD STR SNL CMT4 RCC REG ENV NET PHM LBD CMT5 LBR LBO ELF SAM mnenomics are defined in DOC/SD131V43, appendices C and D. HISTORY ------- Version 1.2 : Deliverable to T-Nova Berlin, 2001 Version 1.3 : BAS Edition 2011 Version 1.4 : BAS CLARIN Repository : free access for European academics 2014