next up previous contents
Next: Bibliography Up: SmartKom Previous: SmartKom Speaker Profile   Contents

Comments on SmartKom

The SmartKom Speech Corpus is a special case of a scientific corpus production. Because the outcome of the total project cannot be defined in detail at the beginning, specifications for the corpus production tend to be inaccurate and open. However, this may also be considered to be an advantage because that way the corpus production can be adapted to the needs of the project partners.

There are 3 major problems with this kind of corpus production:

  1. Logically, the corpus production should start ahead in time before the rest of the partners start their work. That way the necessary data will be available when needed and not at the end of the total project. However in most cases this is not possible because of the funding structure and because it is almost impossible to define the exact data type needed beforehand.
  2. A data collection that adapts to the progress of a scientific project tends to yield many different and inconsistent data types. For example, if during the project an evaluation of special modules is needed and the data collection provides very specialized data for this purpose, these data might not easily be integrated into a monolithic corpus. Care has to be taken that all differing data types are documented in great detail to ensure the future re-usage of the corpus.
  3. In most cases the funding for a scientific corpus production ends at the same time as the scientific work. This is a problem because data will be produced up to the very last minute and will not be properly integrated into the corpus. The solution is to arrange for a third party outside of the project that will take care of the corpus after the scientific project has ended. This institution must be funded independently from the project and must take the responsibility for the data for a longer time span. In the case of SmartKom the BAS took over the data after the SmartKom project was finished.


next up previous contents
Next: Bibliography Up: SmartKom Previous: SmartKom Speaker Profile   Contents
BITS Projekt-Account 2004-06-01