Define the orthographic representation for your corpus and
transliterate your data or render your text material accordingly *
Create a complete list of unique words. Watch out for capital
letters at the beginning of sentences16.1 *
Define the desired contents of each entry in your
dictionary *
Use automatic procedures to create as much content as
possible such as: look-up existing dictionaries,
text-to-phoneme converters, part-of-speech taggers,
etc. (pass 1) **
Verify the contents of pass 1 and/or create information
manually from scratch and produce a corrected version of
the dictionary (pass 2) *
If possible, let this be done by one person for the complete
dictionary **
Repeat the last step by a second person for the complete
dictionary (pass 3) **
Automatically find the differences between pass 1 and pass 2 or
between pass 1 and pass 3
where pass 2 and pass 3 are not consistent and discuss these
inconsistencies with a
group of experts to come up with the final version of the dictionary **
Repeat the last four steps for all content types that need manual
labeling/verification *
Use a simple parser to ensure a proper coding of the final
dictionary. Especially look out for inconsistent usage of blanks and
tab signs. You may also check for homophones and homographs and check
whether they are really valid for your language.
Sources for existing pronunciation dictionaries may be the ELDA16.2, the LDC16.3or the BAS16.4.