One of the main deliverables of BITS is this cookbook-like document describing the production of speech corpora. In this document, the term speech corpora refers to collections of digital recordings of speech together with annotation, meta data, and documentation. Speech corpora are the prime source of data for basic and applied research in the area of spoken language communication, and for technology development in the area of Spoken Language Processing (SLP), e.g. Automatic Speech Recognition (ASR), Text to Speech (TTS) or Speech Synthesis, or Speaker Verification etc.
This cookbook provides prospective users in the scientific community and in engineering with advice on how to produce re-usable, high-quality and consistent speech corpora for their respective needs. Furthermore, it gives an overview of the best practice in this field and presents exemplary role models for some standard cases.
The motivation for the cookbook was the following observation:
Very often large efforts and huge amounts of money are spent on bad speech corpora, i.e. corpora that serve one particular purpose only and were never meant to be shared. These corpora cannot be re-used for other than the originally intended purpose and they are difficult to update or to maintain. As a consequence, they totally neglect their potential commercial and research value.The BAS Bavarian Archive for Speech Signals, located at the University of Munich1.2has often been asked to add a corpus to its catalogue only to find that the corpus is not usable for any other than for the original purpose. In most cases this is primarily due to the fact that this corpus was poorly specified and that its production process was not monitored properly.