next up previous contents
Next: SmartKom Up: SpeechDat II German Previous: Comments to SpeechDat   Contents

Specification Documents

All SpeechDat-II specifications are publicly available on the SpeechDat web site www.speechdat.org. These documents include a description of the overall project goals, the language specific requirements, the database contents and the database exchange formats.

The following document is the README-file for the German fixed telephone network database. It outlines the contents and the structure of the database. The DESIGN.DOC document listed in the README file gives a detailed description of the real contents of the database, and VALREP.DOC is the final validation report of the validation agency. The documentation is contained on every one of the 17 CD-ROMs on which the database is distributed.

                    GERMAN SPEECHDAT(II) FDB4000                         CD-ROM COLLECTION                                                      Version 2.0                       Copyright(C) 1999 by                        SIEMENS AG, MunichCompiled by: Chr. Draxler             Department of Phonetics and Speech Communication             University of Munich             Schellingstr. 3/II             D 80799 Munich             +49/89/2866 9968             +49/89/280 0362 fax             draxler@phonetik.uni-muenchen.deThe German SpeechDat(II) FDB4000 consists of 4000 calls stored on 17 CD-ROMs in the final SpeechDat(II) database exchange format as defined in deliverable SD 1.3.1 V.4.3:CD-ROM Structure----------------/-- DISK.ID/-- README.TXT/-- COPYRIGH.TXT/-- FIXED1DE -- +- DOC-----+-- DESIGN.{DOC | PDF | PS}                |          +-- ISO88591.{PDF | PS}                |          +-- SAMPALEX.{PDF | PS}                |          +-- SAMPSTAT.TXT                |          +-- SUMMARY.TXT                |          +-- TRANSCRP.{PDF | PS}                |          +-- VALREP20.TXT                |				+- INDEX---+-- A1TRNDE.SES                |          +-- A1TSTDE.SES                |          +-- CONTENTS.LST                |                +- PROMPT--+-- SHEET.{PDF | PS}                |				+- SOURCE--+-- CC_PIN.TXT                |          +-- DEFTSTDE.PL                |				+- TABLE---+-- LEXICON.TBL                |          +-- SESSION.TBL                |          +-- SPEAKER.TBL                |				+- BLOCKyy-+                (with yy=[10..58])				           +-- SESyyzz --+  (with zz=[00..99])                       + -- A1yyzzcc.DEA (signal file)                       + -- A1yyzzcc.DEO (SAM label file)                       (cc = corpus code)The BLOCK directories contain the actual recordings.
Each call is written to a SES directory, where the 4-digit number 
in the directory name identifies the session uniquely. The signal 
and label files are held in the session directory; for each 
signal file (extension .DEA) there is the corresponding SAM label 
file (extension .DEO).Note: file name extension mappings:.DOC     Microsoft Word 6.PDF     Adobe Portable Document Format.PS      Adobe PostScript.TXT     DOS-formatted ISO 8859-1.PL      perl script.TBL     tab-delimited ISO 8859-1 table file.DEA     8 KHz 8 bit alaw encoded raw signal file.DEO     ISO 8859-1 encoded SAM label file         The following directories contain documentation and related 
information:DOC    : DESIGN.{DOC|PDF|PS} Contents description of the 
                                          German FDB4000         ISO88591.{PDF|PS}   ISO8859-1 (ISO Latin) code table         SAMPALEX.{PFD|PS}   German SAM-PA table         SAMPSTAT.TXT        SNR values         SD131V43.DOC        Database Exchange Format 
                                          Specification         SD132V24.DOC        Orthographic and Transcription 
                                          Conventions         SUMMARY.TXT         German FDB1000 summary file		 TRANSCRP.{PDF|PS}   the validation and transcription handbook         VALREP.TXT          validation report by SPEX with 
                                          responses by                             U-MunichINDEX  : A1TRNDE.SES  training set file         A1TSTDE.SES  testing set file         CONTENTS.LST contents of the database						                      The order of fields in the table is						                      VOL DIR SRC CCD CRP SCD SEX AGE ACC LBO						                      and the fields are separated by tabs.PROMPT : contains a Portable Document Format and PostScript file         SHEET.{PDF|PS}    prompt sheet layout in the form it was 
                                        distributed to speakersSOURCE : contains the follwing DOS formatted ISO 8859-1 files         CC_PIN.TXT   150 16-digit credit card numbers and 
                                         150 6-digit PIN codes         DEFTSTDE.PL  perl script to define training and test sets                      for the German FDB 4000		 TABLE  : contains the following DOS-formatted ISO 8859-1 files         LEXICON.TBL  the lexicon file with the following                       tab-delimited fields                      ORTHOGRAPHY FREQUENCY SAM-PRONUNCIATION	         SPEAKER.TBL  the speaker information file with the following                      tab separated fields                      SES AGE SEX ACC         SESSION.TBL  the session information file with the following                      tab separated fields                      SES RED RET AGE SEX ACC REG ENV                      this file is used to generate the training and                      test set files A1trnDE.ses and A1tstDE.ses



BITS Projekt-Account 2004-06-01