BAS
Bavarian Archive for Speech Signals
WebServices

Last update of this page: 2021-07-14

BAS WebServices are server-based applications that can be accessed by RESTful calls. Most of these services load text and/or signal data from your local computer, process them and deliver the results via a XML object. The XML object contains the fields 'success' (boolean), 'downloadLink' (the link to the result file), 'output' (text output of the application, e.g. debug information, progress reports etc.), and 'warnings' (possible warnings or error messages of the application). The results of a web service call can be accessed via the download link within 24h after the call.

Example: REST call of MAUS Basic.
AAA334869_0.txt and AAA334869_0.wav are files in the local directory:

curl -v -X POST -H 'content-type: multipart/form-data' -F LANGUAGE=deu -F TEXT=@AAA334869_0.txt -F SIGNAL=@AAA334869_0.wav https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runMAUSBasic

Example: output XML object:

<WebServiceResponseLink>
  <success>true</success>
  <downloadLink>http://clarin.phonetik.uni-muenchen.de:80/BASWebServices//data////2013.12.13_09.21.31_21D1B6BE108C61550B2BC326B225E61E//AAA334869_0.TextGrid</downloadLink>
  <output>/usr/local/bin/maus OUTFORMAT=TextGrid BPF=/usr/share/tomcat6/webapps/BASWebServices//data//2013.12.13_09.21.31_21D1B6BE108C61550B2BC326B225E61E//AAA334869_0.par INSKANTEXTGRID=true LANGUAGE=deu OUT=/usr/share/tomcat6/webapps/BASWebServices//data//2013.12.13_09.21.31_21D1B6BE108C61550B2BC326B225E61E//AAA334869_0.TextGrid INSORTTEXTGRID=true USETRN=true SIGNAL=/usr/share/tomcat6/webapps/BASWebServices//data//2013.12.13_09.21.31_21D1B6BE108C61550B2BC326B225E61E//AAA334869_0.wav</output>
  <warnings></warnings>
</WebServiceResponseLink>

All currently supported BAS WebServices are described in a machine-readable CMDI metadata file. The main objective for this file is to allow automatic embedding of webservices into applications or APIs; a more readable reference for the REST API can be found here.

The usage of BAS web services is subject to the Conditions of Use for Academics or the Conditions of Use for Commercial Institutions.

In the following we describe the most prominent available BAS WebServices by giving example curl calls and a full description of the input/output parameters. Curl calls can be issued from any UNIX command line (konsole,terminal etc.).

Automatic Phonetic Segmentation - MAUS Webservices

See Video Intro and Video Usage to the MAUS technique.

MAUS segments and labels speech data based on an input text or phonemic transcript, and the speech signal. It uses HMM for the alignment and a probabilistic pronunciation model to predict variantion in the pronunciation.

Basic MAUS - `runMAUSBasic`

Input: text (TXT,DOC,DOCX,ODT,RTF,PDF), media file (WAV,...)
Output: phonetic segmentation (BPF,TextGrid,EMU,EXMARALDA,EAF,TEI)

Example REST call and parameter synopsis:
Webservices Help : refer to section runMAUSBasic

General MAUS - `runMAUS`

Input: standard pronunciation (BPF, tier KAN), media file (WAV,...)
Output: phonetic segmentation (BPF,TextGrid,EMU,EXMARALDA,EAF,TEI)

Example REST call and parameter synopsis:
Webservices Help : refer to section runMAUS

Phonetic Transcription without text/transcript - `runMINNI`

Segments and labels a speech audio file into SAM-PA (or IPA) phonetic segments without any text/phonological input; results are stored either in praat compatible TextGrid file (configuration parameter OUTFORMAT=TextGrid) or a CSV table (the BPF MAU tier, configuration parameter OUTFORMAT=csv).

Input: media file (WAV,...)
Output: phonetic segmentation (BPF,TextGrid,EMU,EXMARALDA,EAF,TEI)

Example REST call and parameter synopsis:
Webservices Help : refer to section runMINNI

Get MAUS Phoneme Set - `runMAUSGetInventar`

This service returns a table wth the phoneme set used by BAS WebServices for a language.

Input: Language Code
Output: CSV table

Example REST call and parameter synopsis:
Webservices Help : refer to section runMAUSGetInventar

Speech synthesis - `runTTSFile`

A (German) text file is converted into spoken language; two female and two male voices (also available as BAS corpora) can be selected. The MARY synthesis system has been developed by the University of Saarbrücken.

Input: Text File (TXT,UTF-8)
Output: Signal File (WAV)

Example REST call and parameter synopsis:
Webservices Help : refer to section runTTSFile

Channel Separation - `runChannelSeparator`

Input is a multi-channel RIFF WAVE file where each channel contains the speech of a single speaker in conversation with others. The cross-talk of other speakers is removed from each channel so that all channels are muted except the one with the currently speaking person.
This algorithm was developped by Volker Dellwo and adapted for the BAS WebServices by Fritz Seebauer.

Input: multichannel sound file (WAV)
Output: multichannel sound file (WAV)

Example REST call and parameter synopsis:
Webservices Help : refer to section runChannelSeparator

Grapheme-to-Phoneme Conversion - `runG2P`

A text or a word list (UTF-8) is transformed into the corresponding most likely, phonological standard pronunciation (encoded in SAM-PA or IPA). It also allows POS tagging, syllabification, lexical accent tagging and morphological segmentation and tagging.
The G2P system is empirically trained on a large pronunciation dictionary of the respective language; it was developped by Uwe Reichel at BAS.

Input: Text (TXT,BPF,TCF,TextGrid,UTF-8)
Output: Pronunciation, syllabification, word accent (TXT,BPF,TextGrid,TCF)

Example REST call and parameter synopsis:
Webservices Help : refer to section runG2P

Documentation and additional data

CMDI Metadata Generator COALA - `runCOALA`

Generates corpus and session CMDIs according to the media-corpus-profile and the media-session-profile of the ComponentRegistry by converting five CSV tables to the CMDI format. Use the runCOALAGetTemplates WebService to get templates for these tables. The resulting session CMDIs can be used as they are, while the corpus CMDI needs to be edited by hand.

Input: Excel Tables (CSV,UTF-8)
Output: Corpus and Session Metadata encoded in CMDI

Example REST call and parameter synopsis:
Webservices Help : refer to section runCOALA

Symbolic String Aligner - `runTextAlign`

This service aligns text sequence pairs by minimizing their edit distance. Edit operations are substitution, insertion, and deletion. Next to a naive cost function penalizing any edit operation but null substitution by 1, cost functions can be imported, or estimated probabilistically from the input data, or can be chosen from pre-stored examples. Typical use cases are the alignment of letters and phonemes in pronunciation dictionaries, and the alignment of canonic and spontaneous speech transcriptions in order to infer or verify phonologic rules. The service takes a csv file with two columns separated by a semicolon. Each row contains a string pair to be aligned. The output is a zip file that contains a two-column csv file with the aligned result. Deletions are marked by underscore, insertions by a plus sign. If the cost function is estimated from the input data the zip file additionally contains this cost function in a csv file with three columns separated by semicolons of the form X;Y;C, indicating that the replacement of X by Y is penalized by cost C. This cost file can be re-used for further applications of the aligner.

Input: Excel Table with string pairs (CSV,UTF-8)
Output: Excel Tabel with alignment results, cost functions (CSV,UTF-8)

Example REST call and parameter synopsis:
Webservices Help : refer to section runTextAlign

Automatic Syllabification - `runPho2Syl`

Syllabification of canonical and spontaneous speech transcriptions for multiple languages. The input format is restricted to BAS partitur files generated e.g. by WebMAUS or RunG2P. Canonical transcriptions of the tier KAN as well as spontaneous speech transcriptions of the tiers MAU, PHO, and SAP can be syllabified and are written to the output tiers KAS and MAS, respectively. For spontaneous speech input, it can further be specified whether or not the syllable boundaries are synchronized with word boundaries. Depending on the language, syllabification is carried out by C4.5 decision trees or based on sonority hierarchy.

Input: BPF with tiers KAN, MAU, SAP, or PHO (PAR,UTF-8)
Output: Syllabification (BPF,TextGrid,EMU,EXMARALDA,EAF,TEI)

Example REST call and parameter synopsis:
Webservices Help : refer to section runPho2Syl

Chunk Preparation - `runChunkPreparation`

This service transforms TextGrid and ELAN files to BAS partiture files containing the tiers ORT, TRN, and KAN. ORT and KAN contain the word tokens and their canonical transcriptions, respectively. TRN stores word chunks as given in the specified input file tier. The presence of the TRN tier improves the performance of the automatic phonetic segmentation system WEBMAUS.

Input: chunk segmentation (Praat (TextGrid), ELAN (EAF), CVS)
Output: chunk segmentation (BPF)

Example REST call and parameter synopsis:
Webservices Help : refer to section runChunkPreparation

Automatic Chunking - `runChunker`

This service calculates a chunk segmentation automatically based on the signal and transcript. Input is a BAS Partitur Format file containing at least the tier KAN, while the output is a BPF file containing an additional TRN tier encoding the found chunks. The presence of the TRN tier improves the performance of the automatic phonetic segmentation system WEBMAUS and enables the processing of recordings with more than 3000 words.

Input: BPF with at least tier KAN
Output: BPF with tier TRN

Example REST call and parameter synopsis:
Webservices Help : refer to section runChunker

Anonymization - `runAnonymizer`

This services reads a signal file (sound, video) + BAS Partitur Format annotation + a list of terms to be anonymized in both inputs, masks all occurances in the signal and in the annotation, and returns the two anonymized files in a ZIP archive; or just the anonymized annotation in a ZIP file, if ANNOTONLY=true.

Input: BPF (+ media file), list of terms to be anonymized (TXT, UTF-8)
Output: anonymized BPF (+ media file)

Example REST call and parameter synopsis:
Webservices Help : refer to section runAnonymizer

Subtitle generator - `runSubtitle`

This service maps the result of a MAUS process (a word/phone segmentation) or the result of a ASR (word segmentation) to the original transcript and groups the transcript into subtitles. The service can be used to automatically create a subtitle track from a signal (+ text); it is recommended to use the service Pipeline with parameter PIPE=G2P_(CHUNKER)_MAUS_SUBTITLE (with text input) or PIPE=ASR_SUBTITLE (without text input).

Input: BPF with tiers ORT,MAU (+ original transcript (text document))
Output: subtitles (VTT,SRT,SUB,BPF)

Example REST call and parameter synopsis:
Webservices Help : refer to section runSubtitle

Voice Activity Detection - `runVoiceActivityDetection`

This service automatically segments the input signal into speech and silence intervals. The result is a simple annotation file with one segmentation layer (called 'VAD').

Input: media file (WAV,...)
Output: segmentation speech/silence (BPF,TextGrid,EMU,EXMARALDA,EAF,TEI)

Example REST call and parameter synopsis:
Webservices Help : refer to section runVoiceActivityDetection

Speaker Diarization - `runSpeakDiar`

This services reads a media file (sound, video) and performs a speaker diarization (SD) based on the pyannote python library. https://github.com/pyannote, paper: https://arxiv.org/abs/1911.01255.

Input: media file (WAV,...) (+ BPF)
Output: segmentation and labelling of speaker parts and silence (BPF,TextGrid,EMU,EXMARALDA,EAF,TEI)

Example REST call and parameter synopsis:
Webservices Help : refer to section runSpeakDiar

Text normalization - `runTextEnhance`

This service reads an arbitrary encoded text file and returns a normalized UTF-8 UNIX style text file that is suitable for processing within the BAS WebServices.

Input: text (TXT,DOC,DOCX,ODT,PDF,RTF)
Output: text UTF-8 (TXT)

Example REST call and parameter synopsis:
Webservices Help : refer to section runTextEnhance

Signal processing - `runAudioEnhance`

This services reads a media file and performs several signal processing operations mostly based on the SoX ('Sound Exchange') project. Without any options set, the service produces a RIFF WAVE audio file optimized for processing in the BAS WebServices.

Input: media file (WAV,...)
Output: RIF WAVE file (WAV)

Example REST call and parameter synopsis:
Webservices Help : refer to section runAudioEnhance

Annotation Converter - `runAnnotConv`

This service is a general purpose annotation converter from BAS Partitur Format (BPF) to several standards.

Input: annotation with segmental layer(s) (BPF)
Output: annotation with segmental layer(s) (TextGrid,EMU,EXMARALDA,EAF,TEI)

Example REST call and parameter synopsis:
Webservices Help : refer to section runAnnotConv

Chaining Service - `runPipeline`

This service allows to chain several of the above services into a processing pipeline.

Input: Various signal and annotation/txt formats
Output: Various output formats

Example REST call and parameter synopsis:
Webservices Help : refer to section runPipeline

Copyright © 2013 Bayerisches Archiv für Sprachsignale, Universität Müchen
This page and all other pages with the initial 'BAS' or 'Bas' in the filename may be copied, printed and distributed to other parties, under the condition that the pages are distributed as shown here. Parts of pages or extended pages may not be distributed further without permission of the BAS.

Florian Schiel

BASBavarian Archive for Speech Signals WebServices

Automatic Phonetic Segmentation - MAUS Webservices

Basic MAUS - runMAUSBasic

General MAUS - runMAUS

Phonetic Transcription without text/transcript - runMINNI

Get MAUS Phoneme Set - runMAUSGetInventar

Speech synthesis - runTTSFile

Channel Separation - runChannelSeparator

Grapheme-to-Phoneme Conversion - runG2P

CMDI Metadata Generator COALA - runCOALA

Symbolic String Aligner - runTextAlign

Automatic Syllabification - runPho2Syl

Chunk Preparation - runChunkPreparation

Automatic Chunking - runChunker

Anonymization - runAnonymizer

Subtitle generator - runSubtitle

Voice Activity Detection - runVoiceActivityDetection

Speaker Diarization - runSpeakDiar

Text normalization - runTextEnhance

Signal processing - runAudioEnhance

Annotation Converter - runAnnotConv

Chaining Service - runPipeline