The following document is the README-file for the German fixed telephone network database. It outlines the contents and the structure of the database. The DESIGN.DOC document listed in the README file gives a detailed description of the real contents of the database, and VALREP.DOC is the final validation report of the validation agency. The documentation is contained on every one of the 17 CD-ROMs on which the database is distributed.
GERMAN SPEECHDAT(II) FDB4000 CD-ROM COLLECTION Version 2.0 Copyright(C) 1999 by SIEMENS AG, MunichCompiled by: Chr. Draxler Department of Phonetics and Speech Communication University of Munich Schellingstr. 3/II D 80799 Munich +49/89/2866 9968 +49/89/280 0362 fax draxler@phonetik.uni-muenchen.deThe German SpeechDat(II) FDB4000 consists of 4000 calls stored on 17 CD-ROMs in the final SpeechDat(II) database exchange format as defined in deliverable SD 1.3.1 V.4.3:CD-ROM Structure----------------/-- DISK.ID/-- README.TXT/-- COPYRIGH.TXT/-- FIXED1DE -- +- DOC-----+-- DESIGN.{DOC | PDF | PS} | +-- ISO88591.{PDF | PS} | +-- SAMPALEX.{PDF | PS} | +-- SAMPSTAT.TXT | +-- SUMMARY.TXT | +-- TRANSCRP.{PDF | PS} | +-- VALREP20.TXT | +- INDEX---+-- A1TRNDE.SES | +-- A1TSTDE.SES | +-- CONTENTS.LST | +- PROMPT--+-- SHEET.{PDF | PS} | +- SOURCE--+-- CC_PIN.TXT | +-- DEFTSTDE.PL | +- TABLE---+-- LEXICON.TBL | +-- SESSION.TBL | +-- SPEAKER.TBL | +- BLOCKyy-+ (with yy=[10..58]) +-- SESyyzz --+ (with zz=[00..99]) + -- A1yyzzcc.DEA (signal file) + -- A1yyzzcc.DEO (SAM label file) (cc = corpus code)The BLOCK directories contain the actual recordings. Each call is written to a SES directory, where the 4-digit number in the directory name identifies the session uniquely. The signal and label files are held in the session directory; for each signal file (extension .DEA) there is the corresponding SAM label file (extension .DEO).Note: file name extension mappings:.DOC Microsoft Word 6.PDF Adobe Portable Document Format.PS Adobe PostScript.TXT DOS-formatted ISO 8859-1.PL perl script.TBL tab-delimited ISO 8859-1 table file.DEA 8 KHz 8 bit alaw encoded raw signal file.DEO ISO 8859-1 encoded SAM label file The following directories contain documentation and related information:DOC : DESIGN.{DOC|PDF|PS} Contents description of the German FDB4000 ISO88591.{PDF|PS} ISO8859-1 (ISO Latin) code table SAMPALEX.{PFD|PS} German SAM-PA table SAMPSTAT.TXT SNR values SD131V43.DOC Database Exchange Format Specification SD132V24.DOC Orthographic and Transcription Conventions SUMMARY.TXT German FDB1000 summary file TRANSCRP.{PDF|PS} the validation and transcription handbook VALREP.TXT validation report by SPEX with responses by U-MunichINDEX : A1TRNDE.SES training set file A1TSTDE.SES testing set file CONTENTS.LST contents of the database The order of fields in the table is VOL DIR SRC CCD CRP SCD SEX AGE ACC LBO and the fields are separated by tabs.PROMPT : contains a Portable Document Format and PostScript file SHEET.{PDF|PS} prompt sheet layout in the form it was distributed to speakersSOURCE : contains the follwing DOS formatted ISO 8859-1 files CC_PIN.TXT 150 16-digit credit card numbers and 150 6-digit PIN codes DEFTSTDE.PL perl script to define training and test sets for the German FDB 4000 TABLE : contains the following DOS-formatted ISO 8859-1 files LEXICON.TBL the lexicon file with the following tab-delimited fields ORTHOGRAPHY FREQUENCY SAM-PRONUNCIATION SPEAKER.TBL the speaker information file with the following tab separated fields SES AGE SEX ACC SESSION.TBL the session information file with the following tab separated fields SES RED RET AGE SEX ACC REG ENV this file is used to generate the training and test set files A1trnDE.ses and A1tstDE.ses