Next: Validation Up: Check Lists - Summary Previous: Annotation Contents

Dictionary

To create the dictionary you will most likely proceed through parts of the following procedures (depending on what resources you have):

$\bigcirc$ Define the orthographic representation for your corpus and transliterate your data or render your text material accordingly *
$\bigcirc$ Create a complete list of unique words. Watch out for capital letters at the beginning of sentences^16.1 *
$\bigcirc$ Define the desired contents of each entry in your dictionary *
$\bigcirc$ Use automatic procedures to create as much content as possible such as: look-up existing dictionaries, text-to-phoneme converters, part-of-speech taggers, etc. (pass 1) **
$\bigcirc$ Verify the contents of pass 1 and/or create information manually from scratch and produce a corrected version of the dictionary (pass 2) *
$\bigcirc$ If possible, let this be done by one person for the complete dictionary **
$\bigcirc$ Repeat the last step by a second person for the complete dictionary (pass 3) **
$\bigcirc$ Automatically find the differences between pass 1 and pass 2 or between pass 1 and pass 3 where pass 2 and pass 3 are not consistent and discuss these inconsistencies with a group of experts to come up with the final version of the dictionary **
$\bigcirc$ Repeat the last four steps for all content types that need manual labeling/verification *
$\bigcirc$ Use a simple parser to ensure a proper coding of the final dictionary. Especially look out for inconsistent usage of blanks and tab signs. You may also check for homophones and homographs and check whether they are really valid for your language.

Sources for existing pronunciation dictionaries may be the ELDA^16.2, the LDC^16.3or the BAS^16.4.

Next: Validation Up: Check Lists - Summary Previous: Annotation Contents

BITS Projekt-Account 2004-06-01