next up previous contents
Next: Specification Documents Up: SpeechDat II German Previous: Speaker Profiles   Contents

Comments to SpeechDat

In the project proposal, the planned duration for SpeechDat-II was 24 months. In reality, however, the project took more than 36 months! The main reasons for this significant delay were threefold: The size and the heterogeneous composition of the project consortium, consisting of a variety of industrial and academic partners, made the specification of a common subset of items a very tedious task. The requirements of application developers are quite different from those of service providers or of academia. The question of database exchange value was difficult to solve: is a database of 500 Luxemburg German speakers equal in value to a 5000 speaker database of standard German? How to incorporate late entrants into the consortium - all these questions had to be solved, and they had to be solved by consensus because the project plan did not foresee sanctions for non-cooperative partners.

Speaker recruitment turned out to be the single most critical issue. None of the project partners had experience with such a large speech database collection. Project partners with a good geographic and demographic coverage among their employees found it relatively easy to motivate their employees to participate - examples are national telecom companies. Professional market research companies in general were not used because of the high cost - e.g. in Germany they asked for more money than was available for the entire German data collection - and the lack of a guarantee that they would provide the requested number of speakers.

Most SpeechDat-II databases were ready for validation at about the same time. This imposed a heavy workload on the validation agency; originally it was planned to deliver the databases in sequence so that their validation could proceed with a constant effort over a longer period of time. During the validation grave errors were found in some databases. These errors had to be corrected, either by recording additional material, re-annotation or re-creation of lexica. In some cases, not all errors could be corrected and the database had to undergo an acceptance vote. The most important lesson learned here was that there should be at least three validations: a formal validation of all prompt material prior to any recordings, an early validation of the first few recordings prior to the main recording phase, and a final validation. For very large databases, an intermediate validation is very useful.

SpeechDat has effectively set the standard for many successor projects. It is a show case for the collaboration of academia and industry, and it has proved that direct market competitors can effectively share the effort creating resources while at the same time keeping up the competition for the development of devices, applications and services.


next up previous contents
Next: Specification Documents Up: SpeechDat II German Previous: Speaker Profiles   Contents
BITS Projekt-Account 2004-06-01