Last update of this page: 2021-07-14
BAS WebServices are server-based applications that can be accessed by RESTful calls. Most of these services load text and/or signal data from your local computer, process them and deliver the results via a XML object. The XML object contains the fields 'success' (boolean), 'downloadLink' (the link to the result file), 'output' (text output of the application, e.g. debug information, progress reports etc.), and 'warnings' (possible warnings or error messages of the application). The results of a web service call can be accessed via the download link within 24h after the call.
Example: REST call of MAUS Basic.
AAA334869_0.txt and AAA334869_0.wav are files in the local directory:
curl -v -X POST -H 'content-type: multipart/form-data' -F LANGUAGE=deu -F TEXT=@AAA334869_0.txt -F SIGNAL=@AAA334869_0.wav https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runMAUSBasic
Example: output XML object:
<WebServiceResponseLink> <success>true</success> <downloadLink>http://clarin.phonetik.uni-muenchen.de:80/BASWebServices//data////2013.12.13_09.21.31_21D1B6BE108C61550B2BC326B225E61E//AAA334869_0.TextGrid</downloadLink> <output>/usr/local/bin/maus OUTFORMAT=TextGrid BPF=/usr/share/tomcat6/webapps/BASWebServices//data//2013.12.13_09.21.31_21D1B6BE108C61550B2BC326B225E61E//AAA334869_0.par INSKANTEXTGRID=true LANGUAGE=deu OUT=/usr/share/tomcat6/webapps/BASWebServices//data//2013.12.13_09.21.31_21D1B6BE108C61550B2BC326B225E61E//AAA334869_0.TextGrid INSORTTEXTGRID=true USETRN=true SIGNAL=/usr/share/tomcat6/webapps/BASWebServices//data//2013.12.13_09.21.31_21D1B6BE108C61550B2BC326B225E61E//AAA334869_0.wav</output> <warnings></warnings> </WebServiceResponseLink>
All currently supported BAS WebServices are described in a machine-readable CMDI metadata file. The main objective for this file is to allow automatic embedding of webservices into applications or APIs; a more readable reference for the REST API can be found here.
The usage of BAS web services is subject to the Conditions of Use for Academics or the Conditions of Use for Commercial Institutions.
In the following we describe the most prominent available BAS WebServices by giving example curl calls and a full description of the input/output parameters. Curl calls can be issued from any UNIX command line (konsole,terminal etc.).
MAUS segments and labels speech data based on an input text or phonemic transcript, and the speech signal. It uses HMM for the alignment and a probabilistic pronunciation model to predict variantion in the pronunciation.
runMAUSBasic
Input: text (TXT,DOC,DOCX,ODT,RTF,PDF), media file (WAV,...)
Output: phonetic segmentation (BPF,TextGrid,EMU,EXMARALDA,EAF,TEI)
Example REST call and parameter synopsis:
Webservices Help : refer to section runMAUSBasic
runMAUS
Input: standard pronunciation (BPF, tier KAN), media file (WAV,...)
Output: phonetic segmentation (BPF,TextGrid,EMU,EXMARALDA,EAF,TEI)
Example REST call and parameter synopsis:
Webservices Help : refer to section runMAUS
runMINNI
Segments and labels a speech audio file into SAM-PA (or IPA) phonetic segments without any text/phonological input; results are stored either in praat compatible TextGrid file (configuration parameter OUTFORMAT=TextGrid) or a CSV table (the BPF MAU tier, configuration parameter OUTFORMAT=csv).
Input: media file (WAV,...)
Output: phonetic segmentation (BPF,TextGrid,EMU,EXMARALDA,EAF,TEI)
Example REST call and parameter synopsis:
Webservices Help : refer to section runMINNI
runMAUSGetInventar
This service returns a table wth the phoneme set used by BAS WebServices for a language.
Input: Language Code
Output: CSV table
Example REST call and parameter synopsis:
Webservices Help : refer to section runMAUSGetInventar
runTTSFile
A (German) text file is converted into spoken language; two female and two male voices (also available as BAS corpora) can be selected. The MARY synthesis system has been developed by the University of Saarbrücken.
Input: Text File (TXT,UTF-8)
Output: Signal File (WAV)
Example REST call and parameter synopsis:
Webservices Help : refer to section runTTSFile
runChannelSeparator
Input is a multi-channel RIFF WAVE file where each channel contains the speech of a single speaker in conversation
with others. The cross-talk of other speakers is removed from each channel so that all channels are muted
except the one with the currently speaking person.
This algorithm was developped by Volker Dellwo and adapted for the BAS WebServices by Fritz Seebauer.
Input: multichannel sound file (WAV)
Output: multichannel sound file (WAV)
Example REST call and parameter synopsis:
Webservices Help : refer to section runChannelSeparator
runG2P
A text or a word list (UTF-8) is transformed into the corresponding most
likely, phonological standard pronunciation (encoded in SAM-PA or IPA). It also
allows POS tagging, syllabification, lexical accent tagging and morphological
segmentation and tagging.
The G2P system is empirically trained on a large pronunciation dictionary
of the respective language; it was developped by Uwe Reichel at BAS.
Input: Text (TXT,BPF,TCF,TextGrid,UTF-8)
Output: Pronunciation, syllabification, word accent (TXT,BPF,TextGrid,TCF)
Example REST call and parameter synopsis:
Webservices Help : refer to section runG2P
Documentation and additional data
runCOALA
Generates corpus and session CMDIs according to the media-corpus-profile and the media-session-profile of the ComponentRegistry by converting five CSV tables to the CMDI format. Use the runCOALAGetTemplates WebService to get templates for these tables. The resulting session CMDIs can be used as they are, while the corpus CMDI needs to be edited by hand.
Input: Excel Tables (CSV,UTF-8)
Output: Corpus and Session Metadata encoded in CMDI
Example REST call and parameter synopsis:
Webservices Help : refer to section runCOALA
runTextAlign
This service aligns text sequence pairs by minimizing their edit distance. Edit operations are substitution, insertion, and deletion. Next to a naive cost function penalizing any edit operation but null substitution by 1, cost functions can be imported, or estimated probabilistically from the input data, or can be chosen from pre-stored examples. Typical use cases are the alignment of letters and phonemes in pronunciation dictionaries, and the alignment of canonic and spontaneous speech transcriptions in order to infer or verify phonologic rules. The service takes a csv file with two columns separated by a semicolon. Each row contains a string pair to be aligned. The output is a zip file that contains a two-column csv file with the aligned result. Deletions are marked by underscore, insertions by a plus sign. If the cost function is estimated from the input data the zip file additionally contains this cost function in a csv file with three columns separated by semicolons of the form X;Y;C, indicating that the replacement of X by Y is penalized by cost C. This cost file can be re-used for further applications of the aligner.
Input: Excel Table with string pairs (CSV,UTF-8)
Output: Excel Tabel with alignment results, cost functions (CSV,UTF-8)
Example REST call and parameter synopsis:
Webservices Help : refer to section runTextAlign
runPho2Syl
Syllabification of canonical and spontaneous speech transcriptions for multiple languages. The input format is restricted to BAS partitur files generated e.g. by WebMAUS or RunG2P. Canonical transcriptions of the tier KAN as well as spontaneous speech transcriptions of the tiers MAU, PHO, and SAP can be syllabified and are written to the output tiers KAS and MAS, respectively. For spontaneous speech input, it can further be specified whether or not the syllable boundaries are synchronized with word boundaries. Depending on the language, syllabification is carried out by C4.5 decision trees or based on sonority hierarchy.
Input: BPF with tiers KAN, MAU, SAP, or PHO (PAR,UTF-8)
Output: Syllabification (BPF,TextGrid,EMU,EXMARALDA,EAF,TEI)
Example REST call and parameter synopsis:
Webservices Help : refer to section runPho2Syl
runChunkPreparation
This service transforms TextGrid and ELAN files to BAS partiture files containing the tiers ORT, TRN, and KAN. ORT and KAN contain the word tokens and their canonical transcriptions, respectively. TRN stores word chunks as given in the specified input file tier. The presence of the TRN tier improves the performance of the automatic phonetic segmentation system WEBMAUS.
Input: chunk segmentation (Praat (TextGrid), ELAN (EAF), CVS)
Output: chunk segmentation (BPF)
Example REST call and parameter synopsis:
Webservices Help : refer to section runChunkPreparation
runChunker
This service calculates a chunk segmentation automatically based on the signal and transcript. Input is a BAS Partitur Format file containing at least the tier KAN, while the output is a BPF file containing an additional TRN tier encoding the found chunks. The presence of the TRN tier improves the performance of the automatic phonetic segmentation system WEBMAUS and enables the processing of recordings with more than 3000 words.
Input: BPF with at least tier KAN
Output: BPF with tier TRN
Example REST call and parameter synopsis:
Webservices Help : refer to section runChunker
runAnonymizer
This services reads a signal file (sound, video) + BAS Partitur Format annotation + a list of terms to be anonymized in both inputs, masks all occurances in the signal and in the annotation, and returns the two anonymized files in a ZIP archive; or just the anonymized annotation in a ZIP file, if ANNOTONLY=true.
Input: BPF (+ media file), list of terms to be anonymized (TXT, UTF-8)
Output: anonymized BPF (+ media file)
Example REST call and parameter synopsis:
Webservices Help : refer to section runAnonymizer
runSubtitle
This service maps the result of a MAUS process (a word/phone segmentation) or the result of a ASR (word segmentation) to the original transcript and groups the transcript into subtitles. The service can be used to automatically create a subtitle track from a signal (+ text); it is recommended to use the service Pipeline with parameter PIPE=G2P_(CHUNKER)_MAUS_SUBTITLE (with text input) or PIPE=ASR_SUBTITLE (without text input).
Input: BPF with tiers ORT,MAU (+ original transcript (text document))
Output: subtitles (VTT,SRT,SUB,BPF)
Example REST call and parameter synopsis:
Webservices Help : refer to section runSubtitle
runVoiceActivityDetection
This service automatically segments the input signal into speech and silence intervals. The result is a simple annotation file with one segmentation layer (called 'VAD').
Input: media file (WAV,...)
Output: segmentation speech/silence (BPF,TextGrid,EMU,EXMARALDA,EAF,TEI)
Example REST call and parameter synopsis:
Webservices Help : refer to section runVoiceActivityDetection
runSpeakDiar
This services reads a media file (sound, video) and performs a speaker diarization (SD) based on the pyannote python library. https://github.com/pyannote, paper: https://arxiv.org/abs/1911.01255.
Input: media file (WAV,...) (+ BPF)
Output: segmentation and labelling of speaker parts and silence (BPF,TextGrid,EMU,EXMARALDA,EAF,TEI)
Example REST call and parameter synopsis:
Webservices Help : refer to section runSpeakDiar
runTextEnhance
This service reads an arbitrary encoded text file and returns a normalized UTF-8 UNIX style text file that is suitable for processing within the BAS WebServices.
Input: text (TXT,DOC,DOCX,ODT,PDF,RTF)
Output: text UTF-8 (TXT)
Example REST call and parameter synopsis:
Webservices Help : refer to section runTextEnhance
runAudioEnhance
This services reads a media file and performs several signal processing operations mostly based on the SoX ('Sound Exchange') project. Without any options set, the service produces a RIFF WAVE audio file optimized for processing in the BAS WebServices.
Input: media file (WAV,...)
Output: RIF WAVE file (WAV)
Example REST call and parameter synopsis:
Webservices Help : refer to section runAudioEnhance
runAnnotConv
This service is a general purpose annotation converter from BAS Partitur Format (BPF) to several standards.
Input: annotation with segmental layer(s) (BPF)
Output: annotation with segmental layer(s) (TextGrid,EMU,EXMARALDA,EAF,TEI)
Example REST call and parameter synopsis:
Webservices Help : refer to section runAnnotConv
runPipeline
This service allows to chain several of the above services into a processing pipeline.
Input: Various signal and annotation/txt formats
Output: Various output formats
Example REST call and parameter synopsis:
Webservices Help : refer to section runPipeline