Apart from the actual coding scheme, you have to decide about the contents
as well. The minimum content as described in section
will be a simple table
containing a consistent orthographic representation and a
most likely or canonical pronunciation. Since the latter is
not a well defined term for most languages, please make sure that you
come up with a definition that may successfully be used in the creation
of the dictionary. In some cases you may refer to a standard
dictionary9.2 or
even better to a standardized rule set of pronunciation9.3. If this is not possible, provide a
minimal rule set for problematic cases to be used by your staff during
the work on pronunciation. Include this rule set into your documentation
of the corpus.
If you are working on a German speech corpus, you may use the BAS rule set
for manual transcription as given in appendix .