###### GRAPHEME PHONEME CONVERSION WEBSERVICE ######### ####################################################### # calling webservice 'runG2P' via web page frontend ### ####################################################### http://clarin.phonetik.uni-muenchen.de/BASWebServices/#/services/Grapheme2Phoneme ####################################################### # calling web service 'runG2P' using CURL ############# ####################################################### ## Synopsis > curl -X GET 'http://clarin.phonetik.uni-muenchen.de/BASWebServices/services/help' > curl -v -X POST -H 'content-type: multipart/form-data' \ -F i=@myInputFile \ -F lng=myLanguage \ -F iform=myInputFormat \ -F oform=myOutputFormat \ [ -F featset=myFeatureSet \ -F syl=myBoolean \ -F stress=myBoolean \ -F align=myBoolean \ -F map=myMap \ -F tgitem=myTgItem \ -F tgrate=myTgRate \ -F embed=myEmbed \ ] 'http://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runG2P' ## Input arguments # i [required] input file to be uploaded. UTF-8 character encoding required. Do not forget the @ # lng [required] language of input text (mostly ISO 639-3) myLanguage: aus - Australian English deu - German eng - English fra - French ita - Italian kat - Georgian hun - Hungarian nld - Dutch nze - Newzealand English pol - Polish ron - Romanian slk - Slovak sqi - Albanian use - American English (alternatively rfc5646 codes are accepted; e.g. eng-GB, eng-AU) # iform [required] format of input text myInputFormat: txt - plain text bpf - bas partiture format list - word list (relevant for extended feature set wrt POS tagging, else simply treated as connected text 'txt') tcf - TCF format tg - TextGrid (short or standard format) # oform [required] output format myOutputFormat: txt - t r a n s c r i p t i o n t r a n s c r i p ... tab - word;t r a n s c r i p t i o n exttab - word;t r a n s c r i p t i o n;partOfSpeech;m o r p hs; m o r p h c l a s s e s (for lng=deu|eng only) bpf - bas partiture format (KAN tier added, no blancs between phonemes) bpfs- bas partiture format (KAN tier added, blancs between phonemes) lex - word;t r a n s c r i p t i o n words are unique and alphanumerically sorted extlex - output as for exttab, but unique and sorted (for lng=deu|eng only) tcf - TCF format with added trancriptions (for non-TCF input, elements and are generated from scratch) exttcf - additional output of part of speech, morphs and morph classes (for lng=deu|eng only) tg - TextGrid. Item bas_trs will be added, that contains the transcription for each interval in item myItem which is to be specified in -tgitem myItem. Requires iform 'tg'. exttg - extended textGrid. Items bas_trs, bas_pos, bas_m, and bas_mc will be added that contain transcription, part of speech, morphemes, and morpheme classes, respectively, for each interval in item myItem which is to be specified in -tgitem myItem. Requires iform 'tg' (for lng=deu|eng only) # featset feature set chosen for g2p and word stress assignment myFeatureset: - grapheme window extended - including POS and morphological features. Only available for deu and eng # syl syllabification myBoolean: |yes # stress word stress assignment myBoolean: |yes # align 1:1 alignment of transcription to orthography myBoolean: |yes # map mapping from one inventory to another (e.g. from SAMPA to IPA) myMap: deu_ipa eng_ipa ... # tgitem needed for iform=tg. Name of the TextGrid item from which the text is to be extracted. Case-sensitive # tgrate: needed for the combination of iform=tg and oform=bpf(s). Sample rate (in Hz), so that time values of the TextGrid can be converted to sample values in the bpf # embed Macro parameter. myEmbed: maus Use 'maus' if G2P output will be used as input for WEBMAUS If set to 'maus', then -{syl|stress} set to 'no' -align: if not 'no', set to 'maus' -map set to 'myLanguage_maus' -oform: if not 'bpf(s)', set to 'bpf' ## Remarks: - all oform=ext* settings are available for languages deu and eng only - word stress is fix and thus not assigned for French, Hungarian, Polish, and Slovak - mapping is not supported for all possible inventory combinations, but myLng_ipa works for any supported language ### Example ################################################## input: German plain text file input_deu.txt output: - 2-column table: word;t r a n s c r i p t i o n - transcription is syllabified - word stress is assigned - phoneme inventory is mapped from German SAMPA to IPA # step 1: > curl -v -X POST -H 'content-type: multipart/form-data' \ -F lng=deu \ -F iform=txt \ -F oform=tab \ -F stress=yes \ -F syl=yes \ -F map="deu_ipa" \ -F i=@input_deu.txt \ 'http://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runG2P' # step 2: the webservice returns an XML snippet that contains the response link: truehttp://clarin.phonetik.uni-muenchen.de:80/BASWebServices//data///2014.04.03_13.05.54_85C3C9CA125613D878445ADAC611AFC4//input_deu.tab # step 3: get the content of the response in the element using WGET > wget http://clarin.phonetik.uni-muenchen.de:80/BASWebServices//data///2014.04.03_13.05.54_85C3C9CA125613D878445ADAC611AFC4//input_deu.tab ################################################################ # combining 'runG2P' with WEBMAUS phonetic segmentation ######## ################################################################ # alternative 1: in two steps 1. use runG2P, set 'embed' option to 'maus' 2. use the resulting .par file as input for WebMAUSGeneral http://clarin.phonetik.uni-muenchen.de/BASWebServices/#/services/WebMAUSGeneral # alternative 2: single step use WebMAUSMultiple http://clarin.phonetik.uni-muenchen.de/BASWebServices/#/services/WebMAUSMultiple WebMAUSMultiple calls runG2P, thus it can handle all input formats described above ############################################################### web service front-end by Thomas Kisler, BAS G2P back-end tool by Uwe Reichel, BAS {kisler|reichelu}@phonetik.uni-muenchen.de 2014-11-10