Bavarian Archive for Speech Signals

Same page in German

Last update of this page: 2016-06-22

BAS web services are server-based applications that can be initiated by REST calls. Most of these services load text and/or signal data from your local computer, process them and deliver the results via a XML object. The XML object contains the fields 'success' (boolean), 'downloadLink' (the link to the result file), 'output' (text output of the application, e.g. debug information, progress reports etc.), and 'warnings' (possible warnings or error messages of the application). The results of a web service call can be accessed via the download link within 24h after the call.

Example: REST call of MAUS Basic.
AAA334869_0.txt and AAA334869_0.wav are files in the local directory:

curl -v -X POST -H 'content-type: multipart/form-data' -F LANGUAGE=deu -F TEXT=@AAA334869_0.txt -F SIGNAL=@AAA334869_0.wav ttp://

Example: output XML object:

  <output>/usr/local/bin/maus OUTFORMAT=TextGrid BPF=/usr/share/tomcat6/webapps/BASWebServicesTest//data//2013.12.13_09.21.31_21D1B6BE108C61550B2BC326B225E61E//AAA334869_0.par INSKANTEXTGRID=true LANGUAGE=deu OUT=/usr/share/tomcat6/webapps/BASWebServicesTest//data//2013.12.13_09.21.31_21D1B6BE108C61550B2BC326B225E61E//AAA334869_0.TextGrid INSORTTEXTGRID=true USETRN=true SIGNAL=/usr/share/tomcat6/webapps/BASWebServicesTest//data//2013.12.13_09.21.31_21D1B6BE108C61550B2BC326B225E61E//AAA334869_0.wav</output>

All currently supported webservices are described in a CMDI metadata file. The main objective for this file is to allow automatic embedding of webservices into applications or APIs; furthermore it can be used as a reference for webservice calls and their respective parameters.

The usage of BAS web services is subject to the Conditions of Use for Academics or the Conditions of Use for Commercial Institutions.

In the following we describe the most prominent available BAS web services by giving example curl calls and a full description of the input/output parameters. Curl calls can be issued from any UNIX command line (konsole,terminal etc.).

Automatic Phonetic Segmentation - MAUS Webservices

See Video Intro and Video Usage to the MAUS technique.

Basic MAUS - runMAUSBasic

Input: text (TXT,UTF-8), signal (WAV,NIST-SPHERE)
Output: phonetic segmentation (TextGrid)

Example REST call and parameter synopsis:
Webservices Help : refer to section runMAUSBasic

General MAUS - runMAUS

Input: standard pronunciation (BPF tier KAN), signal (WAV,NIST-SPHERE)
Output: segmentation (BPF,TextGrid,Emu)

Example REST call and parameter synopsis:
Webservices Help : refer to section runMAUS

Speech synthesis - runTTSFile

A (German) text file is converted into spoken language; two female and two male voices (also available as BAS corpora) can be selected. The MARY synthesis system has been developed by the University of Saarbrücken.

Input: Text File (TXT,UTF-8)
Output: Signal File (WAV)

Example REST call and parameter synopsis:
Webservices Help : refer to section runTTS

Text-to-Phoneme Conversion - runG2P

A text or a word list (UTF-8) is transformed into the corresponding most likely, phonological standard pronunciation (encoded in SAM-PA or IPA). The G2P system is empirically trained on a large pronunciation dictionary of the respective language; it was developped by Uwe Reichel at BAS.

Input: Text (TXT,BPF,TCF,TextGrid,UTF-8)
Output: Pronunciation, syllabification, word accent (TXT,BPF,TextGrid,TCF)

Example REST call and parameter synopsis:
Webservices Help : refer to section runG2P

Documentation and additional data

CMDI Metadata Generator COALA - runCOALA

Generates corpus and session CMDIs according to the media-corpus-profile and the media-session-profile of the ComponentRegistry by converting five CSV tables to the CMDI format. Use the runCOALAGetTemplates WebService to get templates for these tables. The resulting session CMDIs can be used as they are, while the corpus CMDI needs to be edited by hand.

Input: Excel Tables (CSV,UTF-8)
Output: Corpus and Session Metadata encoded in CMDI

Example REST call and parameter synopsis:
Webservices Help : refer to section runCOALA

Symbolic String Aligner - runTextAlign

This service aligns text sequence pairs by minimizing their edit distance. Edit operations are substitution, insertion, and deletion. Next to a naive cost function penalizing any edit operation but null substitution by 1, cost functions can be imported, or estimated probabilistically from the input data, or can be chosen from pre-stored examples. Typical use cases are the alignment of letters and phonemes in pronunciation dictionaries, and the alignment of canonic and spontaneous speech transcriptions in order to infer or verify phonologic rules. The service takes a csv file with two columns separated by a semicolon. Each row contains a string pair to be aligned. The output is a zip file that contains a two-column csv file with the aligned result. Deletions are marked by underscore, insertions by a plus sign. If the cost function is estimated from the input data the zip file additionally contains this cost function in a csv file with three columns separated by semicolons of the form X;Y;C, indicating that the replacement of X by Y is penalized by cost C. This cost file can be re-used for further applications of the aligner.

Input: Excel Table with string pairs (CSV,UTF-8)
Output: Excel Tabel with alignment results, cost functions (CSV,UTF-8)

Example REST call and parameter synopsis:
Webservices Help : refer to section runTextAlign

Automatic Syllabification - runPho2Syl

Syllabification of canonical and spontaneous speech transcriptions for multiple languages. The input format is restricted to BAS partitur files generated e.g. by WebMAUS or RunG2P. Canonical transcriptions of the tier KAN as well as spontaneous speech transcriptions of the tiers MAU, PHO, and SAP can be syllabified and are written to the output tiers KAS and MAS, respectively. For spontaneous speech input, it can further be specified whether or not the syllable boundaries are synchronized with word boundaries. Depending on the language, syllabification is carried out by C4.5 decision trees or based on sonority hierarchy.

Input: BAS Partitur Format (PAR,UTF-8)
Output: BAS Partitur Format (PAR,UTF-8), Praat TextGrid

Example REST call and parameter synopsis:
Webservices Help : refer to section runPho2Syl

Phonetic Transcription - runMINNI

Segments and labels a speech audio file into SAM-PA (or IPA) phonetic segments without any text/phonological input; results are stored either in praat compatible TextGrid file (configuration parameter OUTFORMAT=TextGrid) or a CSV table (the BPF MAU tier, configuration parameter OUTFORMAT=csv).

Input: Signal File (WAV)
Output: BAS Partitur Format, CSV, TextGrid

Example REST call and parameter synopsis:
Webservices Help : refer to section runMINNI

Chunk Preparation - runChunkPreparation

This service transforms TextGrid and ELAN files to BAS partiture files containing the tiers ORT, TRN, and KAN. ORT and KAN contain the word tokens and their canonical transcriptions, respectively. TRN stores word chunks as given in the specified input file tier. The presence of the TRN tier improves the performance of the automatic phonetic segmentation system WEBMAUS.

Input: Praat (TextGrid), ELAN (EAF)
Output: BAS Partitur Format

Example REST call and parameter synopsis:
Webservices Help : refer to section runChunkPreparation

Copyright © 2013 Bayerisches Archiv für Sprachsignale, Universität Müchen
This page and all other pages with the initial 'BAS' or 'Bas' in the filename may be copied, printed and distributed to other parties, under the condition that the pages are distributed as shown here. Parts of pages or extended pages may not be distributed further without permission of the BAS.

Florian Schiel