BAS
Bavarian Archive for Speech Signals
File Formats

Gleiche Seite in deutsch

This page was last updated 2013-04-05


This page contains description and definitions of recommended/accepted file formats of BAS.
Aside from the listed types below BAS will support all non-proprietary file formats as recommended by CLARIN.


  1. Recommended Standard Signal File Formats
  2. Accepted Proprietary Signal File Formats
  3. Accepted Metadata Formats
  4. Recommended Annotation/Segment/Label File Formats


Signal Files


PhonDat 1

Signal files with PhonDat 1 Header contain a binary header of constant length (512 bytes). The signal samples (2 bytes per sample) start after this header and are always in LoHi byte order (Intel format). The header contains a defined structure with information as sampling frequency, resolution in bits, etc. The header is ILS comaptible.

For reading and writing please use the software delivered with the corpus (modul header.c).

A detailed description of the binary header structure can be found here.


PhonDat 2

PhonDat 2 is an extension of the PhonDat 1 format. After the binary header of 512 bytes additional blocks of 512 bytes follow which contain the orthography and canonical transcript of the utterance (SAM-PA).
The PhonDat 2 header can be identified by the version number (2) in the binary part.

For reading and writing please use the software delivered with the corpus (modul header.c).

A detailed description of the binary header structure and the following header blocks can be found here.


NIST - SPHERE

The NIST - SPHERE speech header format was defined by the 'National Institute of Standards and Technology, USA'. It is used in many American speech corpora.

A detailed description of the NIST/SPHERE formats can be found here.

Some BAS corpora contain data with NIST headers. To transform NIST/SPHERE into other standard formats we recommend SoX, e.g.:

sox -t sph input.nist output.wav


Segment/Label Files


S0 Format

The S0 Format contains word labels of utterances longer than a single word. The format was defined in the German PhonDat project. The label files are in ASCII and have the same prefix as the corresponding signal files. The extension is .S0.

Syntax:


<file> = <Name of segment file> CR
         <Orthography> CR
         oend CR
         <Canonical form> CR
         kend CR
         hend CR
         <list of word segments> 

<list of word segments> = <begin sample> <marker> CR
                                ...

<begin sample> = number of first sample 

<marker> = '#c:' (beginning of first word)  OR
           <canonical word form> (as read from the lexicon)  OR
           '.' (end of last word)

<Name of segment file> = any valid filename

<Orthography> =
The orthographic string contains the standard orthography or a
transliteration with additional markers of the spoken utterance.
German Umlauts are represented either by LaTeX
convention or by 7 bit ASCII signs or by German Character set
coding used by DEC and Sun:

Umlaut  LaTeX   7 Bit ASCII (dec)       German Char Set (hex)
Ae      "A      [ (91)                  C4
Ue      "U      ] (93)                  CD
Oe      "O      \ (92)                  D6
ae      "a      { (123)                 E4
ue      "u      } (125)                 FC
oe      "o      | (124)                 F6
ss      "s      ~ (126)                 DF

<Canonical form> =
The canonical string contains the exspected citation forms of the
word in the utterance. Note that this is NOT a transcription of the
signal. Symbols used are the German subcorpus of the 
SAM-PA, with
following changes to SAM-PA:

Q       Glottal Stop
q       Glottalisierung (not in canonical forms!)
'       main stress
"       secondary stress
#       compositum marker (optional)
+       function word marker (suffix, optional)

Words are seperated by two blanks, phonemic labels are seperated by
one blank. 

Remarks:


S1 Format

The S1 Format contains the phonological segmentation of the utterance. The format was defined in the German PhonDat project. The label files are in ASCII and have the same prefix as the corresponding signal files. The extension is .S1.

Syntax:


<file> = <Name of segment file> CR
         <Orthography> CR
         oend CR
         <Canonical form> CR
         kend CR
         <Transcription> CR
         hend CR
         <list of phoneme segments> 

<list of phoneme segments> = <begin sample> <marker> CR
                                   ...

<begin sample> = number of first sample 

 = '#c:' (beginning of first word)  OR
           '#p:' (pause) OR
           '#v:' (mis-pronunciation) OR
            OR
            OR
            OR
            

 = $ (ordinary segment)

 = ##

 = $#

 = any string of <extended 
German SAM-PA symbols> 

 = '#.' OR '#,' OR '#?' OR '#!'

<Name of segment file> = any valid filename

<Orthography> =
The orthographic string contains the standard orthography or a
transliteration with additional markers of the spoken utterance.
German Umlauts are represented either by LaTeX
convention or by 7 bit ASCII signs or by German Character set
coding used by DEC and Sun:

Umlaut  LaTeX   7 Bit ASCII (dec)       German Char Set (hex)
Ae      "A      [ (91)                  C4
Ue      "U      ] (93)                  CD
Oe      "O      \ (92)                  D6
ae      "a      { (123)                 E4
ue      "u      } (125)                 FC
oe      "o      | (124)                 F6
ss      "s      ~ (126)                 DF

<Canonical form> =
The canonical string contains the exspected citation forms of the
word in the utterance. Note that this is NOT a transcription of the
signal. Symbols used are the German subcorpus of the 
SAM-PA, with
following changes to SAM-PA:

Q       Glottal Stop
q       Glottalisierung (not in canonical forms!)
'       main stress
"       secondary stress
#       compositum marker (optional)
+       function word marker (suffix, optional)

Words are seperated by two blanks, phonemic labels are seperated by
one blank. 

<Extended German SAM-PA symbols> =
See the here for a complete table of extended SAM-PA symbols.
Aside of the defined German SAM-PA symbols we use the following
additional symbols:
~               : nasalation, e.g. E~
Q               : glottal stop (instead of ? in SAM-PA) 
'               : canonical main stress (of word)
"               : canonical secondary stress (of word)
q               : glottalization
%               : uncertain boundary, e.g. $%a:
-               : change to canonical form:
                  replacement:  a:-A
                  elision:      a:-
                  insertion:    -A
=               : realization of two syllables in a diphtong, e.g. E:=6
+               : function word (placed after last segment)

Remarks:


S2 Format

The S2 format contains an automatically generated phonological annotation of the signal.
The format is quite the same as PhonDat 1 with the following alterations:


BAS Partitur Format

General

Most formats of files with segmental information to speech signal have the disadvantage that Therefore a new open format based on the SAM Label Format was developed at BAS which eludes most of the mentioned problems. In this format all levels of description should be described independently but time aligned like the single parts of a score. Hence this format was called 'BAS Partitur Format' (German for 'score').

In the future all BAS corpora will be distributed with the new BAS Partitur Format, if they contain segmental information of any kind. The former used formats will be retained but not further updated.

An up-to-date publication of version 1.2 can be downloaded here (1998).

The BAS Partitur Format has the following features:

Files and Mimetype

As in the SAM standard BPF files are of type text/plain. Allowed codings are 7-bit ASCII or UTF-8. Some BPF tiers also allow the coding in LaTeX.

Usually BPF files have the extensions '*.par' or '*.PAR' and the mimetype 'text/plain-bas'.

BPF files are 'line oriented', that is information is structured in lines and optimized for line processing UNIX tools such as grep, sed, gawk etc. A XML version of the Annotation Graph concept proposed by Liberman (ATLAS format) can be used to handle the same information as the BPF file in XML. The latter file have usually the extensions '*.ags' or '*.AGS' and the mimetype 'text/xml'. The DTD of this file type can be found here.

History

1.0   : 01.09.95 Preliminary Definition of the BAS Partitur Format
        PLEASE DO NOT USE THIS VERSION ANY MORE!
1.1   : 01.06.96 First Definition structured in classes
1.2   : 28.08.96 Label ELF: removed from definition
        (tool par-1.1-to-1.2 transforms 1.1 files into 1.2 files)
1.2.1 : ??
1.2.2 : tier DAS added
1.2.3 : tier TR2, SUP added
1.2.4 : tier PRS added
1.2.5 : tier NOI added
1.2.6 : distinction between symbolic links to word groups (list of word
        numbers seperated by kommata) and symbolic links to events between 
        words (eg. noises, number pairs seperated by semi-colon)
        changed class definition of class 1, 4 and 5 accordingly
        changed tier defintion NOI
1.2.7 : 12.09.00 Tier LBP and LBG added
1.2.8 : 11.05.01 Tier PRO,POS,LMA,SYN,FUN,LEX added
1.2.9 : 07.08.01 Tier IPA added
1.2.10 : 29.08.01 Tier TRN added
1.2.11 : 28.11.01 Tier TRS added
1.2.12 : 20.07.02 : Tiers GES,USH,USM,OCC,USP added
1.2.13 : 22.10.02 : Tier GES: definition of gestures extended
                    Tier TLN added
1.2.14 : 21.04.06 : Tier PRM added		    
1.2.15 : 21.02.07 : Tier TRW added		    
1.2.16 : 21.09.09 : Tier MAS added
1.3    : 05.10.12 : Allow UTF-8 coding of label content

Definition of Structure 1.2

A Partitur file has the same prefix like the corresponding signal file but the extension .par.

The contents is in 7-bit ASCII or UTF-8 exclusively (to garanty portability to all platforms); depending on the label type the labels may contain special characters which are either coded in LaTeX or UTF-8. Each line starts with a three-byte label followed by a colon, which defines synopsis and semantics of the following line. The following units of the line are seperated by 'white spaces' (blank, tab).

The Partitur file is structured into a header and a body (like SAM description files are). The header stretches from the beginning of the file to the label LBD:; the body from the label LBD: to the end of file where the last line has to be closed by a 'new line' (the final SAM label ELF:was omitted for the BAS Partitur Format since it prevents effective processing of the Partitur files).

The header contains SAM-compatible lines of general information. The following entries are compulsary:

LHD: Partitur file version
REP: Place of recording
SNB: Number of Bytes per Sample
SAM: Sampling Frequency in Hz
SBF: Byteorder (Intel 01, Motorola 10)
SSB: Bit Resolution
NCH: Number of Channels
SPN: Speaker ID
LBD:

Example:

LHD: Partitur 1.3
REP: Muenchen
SNB: 2
SAM: 16000
SBF: 01
SSB: 16
NCH: 1
SPN: PS1
LBD:

The following entries are optional; aside from these other entries are tolerated as long as they do not conflict with compulsary and optional entries:

FIL: SAM File Type
TYP: Typ of SAM Label File
DBN: Corpus Name
VOL: Number of Volume
DIR: Directory in Volume
SRC: Name of speech file
BEG: Beginning of labeling sequence
END: End of labeling sequence
RED: Date of Recording
RET: Duration
RCC: Recording Conditions
CMT: Comment
SPI: Speaker Information
PCF: Name of Protocol File
PCN: Protocol Number
EXP: Name of Segmenter
SYS: Labelingsystem
DAT: Date of Labeling
SPA: SAM-PA Version

All header labels are SAM-compatible.

The body starts after the label LBD: and stretches to the the end of file. It contains the tiers of the Partitur. Each tier is identified by an unique label. The order of tiers as well as the order of lines within a tier is not significant.

There are 5 basic classes of tiers:

  1. Tiers with symbolic relation

    A line of this tier contains three fields:

    • the tier label
    • a comma seperated list of integers that reference the item to one or more words or
      a pair of integers separated by semi-colon refering to an event between those two words
    • a string with the label information
    These three items are separated by white spaces.
    The symbolic links (relations) refer to a reference tier which numbers the word units beginning with zero. (the choice of word units as symbolic relations is arbitrarily!).
    The label string itself has an special synopsis which is defined in the tier definition.

    Example:

    TRL: 6,7 mit'm
    NOI: 4;5 #Klopfen

  2. Tiers with time relation, time consuming

    A line of this tier contains four fields:

    • the tier label
    • two integers denoting the beginning and duration of the event.
    • a string containing the label information
    The meaning of the integers is defined by the tier definition (possible are samples, millisecs, etc.)

    Example:

    GES: 2348484 93448 I-Geste ...

  3. Tiers with time relation, not time consuming

    A line of this tier contains three fields:

    • the tier label
    • one integer denoting the time position of the event.
    • a string containing the label information
    The meaning of the integer is defined by the tier definition (possible are samples, millisecs, etc.)

    Example:

    PRB: 13456 TON: P*; FUN: PA

  4. Tiers with time and symbolic relation, time consuming

    A line of this tier contains five fields:

    • the tier label
    • two integers denoting the beginning and duration of the event.
    • a comma seperated list of integers that reference the item to one or more words or
      a pair of integers seperated by semi-colon refering to an event between those two words
    • a string containing the label information
    The meaning of the integers is defined by the tier definition (possible are samples, millisecs, etc.)

    Example:

    SAP: 13456 345 9 aU

  5. Tiers with time and symbolic relation, not time consuming

    A line of this tier contains four fields:

    • the tier label
    • one integer denoting the time position of the event.
    • a comma seperated list of integers that reference the item to one or more words or
      a pair of integers seperated by semi-colon refering to an event between those two words
    • a string containing the label information
    The meaning of the integer is defined by the tier definition (possible are samples, millisecs, etc.)

    Example:

    PRB: 13456 13 TON: P*; FUN: PA

Remarks:

Definition of Tiers (version 1.2.2)

  1. Citation Form (canonical pronunciation) KAN: class 1

    Synopsis:

    KAN: (symbolic link) (transcript)

    This tier contains a list of the spoken words within the utterance annotated in SAMPA (in case of German the extended German SAM-PA is used).
    The segmentation of the whole utterance is done in word units, where everything counts as a word that is produced by the articulatory organs of the speaker and can be interpreted as 'speech'. Following this definition hesitations are words, while laughing, coughs, etc. are not. This separation isn't always clear, but on the other hand the selection of word units is abitrarily as well. The main point is a unique reference tier for symbolic relations in other tiers.
    Another problems is the reduction of words that are annotated in the orthographic form, eg. mit'm. In these cases the reduction is restituted (in this example /mIt de:m/). The reason for this lies in the fact that some of these reductions should be automatically accessible.

    Example:

    KAN: 0  j'a:
    KAN: 1  Qalzo:+
    KAN: 2  QE:m
    KAN: 3  h'OYt@
    KAN: 4  Qo:d6+
    KAN: 5  m'O6g@n
    

  2. Phonemic Transcription PTR: class 1

    Definition:

    PTR: (symbolic link) (transcript)

    The tier contains the phonemic transcript of the spoken words of the utterance. In contrast to the KAN tier these will deviate from the canonical (citation) pronunciation form since speakers rarely speak in citation pronunciation.

    Example:

    PTR: 0  ja:
    PTR: 1  alzO
    PTR: 2  @m
    PTR: 3  hOYt@
    PTR: 4  o:d6
    PTR: 5  mO6N
    

  3. Orthography ORT: class 1

    Synopsis:

    ORT: (symbolic link) (orthography)

    The tier 'Orthography' contains the orthographic (lexical) forms corresponding to the units in the tier 'Vorschlagstranskription' (see above).
    Words are not capitalized at the beginning of an utterance or sentence within an utterance (except nouns of course). German 'Umlauts' and other letter not conform with 7 Bit ASCII are written as to be used for the lexical access. Therefore the coding might differ in different speech corpora, e.g. ISO-8859 or LaTeX coding.
    This tier is used for an easy lexicon reference; therefore no additional markers except lexical words are allowed. There is no punctuation in this tier. Lexical words include items that are contained in the KAN tier (eg. hesitations, word breaks).

    Example:

    ORT: 0  ja
    ORT: 1  also
    ORT: 2  <"ahm>
    ORT: 3  heute
    ORT: 4  oder
    ORT: 5  morgen
    

  4. Verbmobil Transliteration TRL: class 1

    Synopsis:

    TRL: (list of symbolic links) (transliteration) class 1

    The tier 'Verbmobil Transliteration' contains the transliteration of the utterance according to the VM conventions 3.0. The transliteration is segmented into the units of the KAN tier (see above). Therefore multiple references may occur (eg. if a reduced form of two words is written as one unit in the transliteration). Each segment covers the scope from the begin of the referenced unit(s) to the begin of the next referenced unit(s). By doing this it may happen that the first line of this tier contains no referenced unit. In this case the line is aligned to the first unit.

    A detailed description of the Verbmobil I transliteration format can be found here (German only!).

    Example:

    TRL: 0  <Schmatzen>
    TRL: 0  ja ,
    TRL: 1  also
    TRL: 2  <"ahm>
    TRL: 3  heute
    TRL: 4  oder
    TRL: 5  morgen .
    

  5. Verbmobil Transliteration II TR2: class 1

    Synopsis:

    TR2: (list of symbolic links) (transliteration) class 1

    The tier 'Verbmobil II Transliteration' contains the transliteration of the utterance according to the Verbmobil II conventions. A new improved format was necessary because the VM I format was not parsable. For more information about the VM II format see here.

    Our partner at CMU kindly provided an English translation also.

    The transliteration is segmented into the units of the KAN tier (see above) by starting a new line after each unit. Exceptions are punctuations and pronunciation comments that are kept together with the last unit (this is just for a better readability).

    Example:

        TR2: 25 ~Weihnachten
        TR2: 26 ist
        TR2: 27 das
        TR2: 28 sowieso
        TR2: 29 immer
        TR2: 30 etwas
        TR2: 31 schwierig ,
        TR2: 32 und
        TR2: 33 <"ahm>
        TR2: 34 in
        TR2: 35 der
        TR2: 36 #zweiten
        TR2: 37 Dezemberwoche
        TR2: 38 bin
        TR2: 39 ich
        TR2: 40 in
        TR2: 41 ~M"unchen
        TR2: 42 auf
        TR2: 43 dem 
        TR2: 44 Kongre"s .
        TR2: 45 also
        TR2: 46 bliebe
        TR2: 47 noch
    

  6. Superimposed Speech SUP: class 1

    Synopsis:

    SUP: (list of symbolic links) (utterance-id) (transliteration) class 1

    In multi-party recording as in the Verbmobil II project it may happen that the speech of the currently recorded speaker is actively super-imposed by another dialog partner (cross talk). To denote this the tier SUP was added to the format. It will give the transliteration of the 'foreign' speaker together with the symbolic markers to which parts of speech of the recorded speaker these superimposed events are asigned to. The item 'utterance-id' gives the name of the correspondig Bas Partitur file containing the superimposing part of speech. The tier SUP is currently only used in combination with the tier TR2. For a detailed discussion of superimposed speech in the Verbmobil II project please click here.

    Example:

    TR2: 0 ich
    TR2: 1 w"urde
    TR2: 2 vorschlagen ,
    TR2: 3 da"s
    TR2: 4 wir9@
    TR2: 5 dann9@
    TR2: 6 <:<#> hinfliegen:> ,
    TR2: 7 <:<#> ich:>
    TR2: 8 hab'
    TR2: 9 jetzt 
    TR2: 10 aber
    TR2: 11 <:<#Rascheln> grade:>
    TR2: 12 <:<#Rascheln> keine:>
    TR2: 13 Unterlagen
    TR2: 14 da . <#>
    SUP: 4,5 g002acn2_028_AAK.par	@9ja
    
    In this example the speaker is superimposed during the words 4 and 5 by the single word 'ja' of another speaker. The latter occurs in the BAS Partitur file 'g002acn2_028_AAK.par'.

  7. Phonetic Segmentation PhonDat PHO: class 4

    Synopsis:

    PHO: (begin) (duration) (list of symbolic links) (label string)

    This tier contains a totally time-consuming segmentation into phonemic units (extended German SAM-PA , broad phonetic transcript). The first number denotes the beginning of the segment in samples counted from the beginning of the speech file; the second number the duration of the segment in samples.
    The conventions of labeling and segmentation is briefly described here.

    Synopsis of label string

    <label string> = '#c:' (beginning of first word)  OR
               '#p:' (pause) OR
               '#v:' (mis-pronunciation) OR
               <segment> OR
               <word boundary segment> OR
               <compound boundary segment> OR
               <punctuation>
    
    <segment> = $<sampa string> (ordinary segment)
    
    <word boundary segment> = ##<sampa string>
    
    <compound boundary segment> = $#<sampa string>
    
    <sampa string> = any string of <extended German SAM-PA symbols>
    
    <punctuation> = '#.' OR '#,' OR '#?' OR '#!'
    

    A definition of extended German SAM-PA can be found here.

    Example:

    PHO: 2473	0	0	#c:
    PHO: 2473	1100	0	##d
    PHO: 3573	0	0	$a-@
    PHO: 4126	2007	0	$s
    PHO: 6133	0	0	$-+
    PHO: 6133	1130	1	##g
    PHO: 7263	1206	1	$e:
    PHO: 8496	937	1	$t
    PHO: 9433	0	2	##Q-
    PHO: 9433	0	2	$-q
    PHO: 9433	2698	2	$aU
    PHO: 12131	1178	2	$x
    PHO: 13309	0	2	$-+
    PHO: 13309	962	3	##n
    PHO: 14271	1675	3	$I
    PHO: 15946	4308	3	$C
    PHO: 18579	0	3	$t-
    PHO: 18579	0	3	$-+
    PHO: 18579	5467	3	#p:
    

  8. Phonetic Segmentation SAM-PA SAP: class 4

    Synopsis:

    SAP: (begin) (duration) (list of symbolic links) (label string)

    This tier contains a segmentation into phonemic units (extended German SAM-PA , broad phonetic transcript). In contrast to the PHO tier (see above) this segmentation is not stringent time consuming. That is, there might be pauses in the signal that are not labeled (which happens frequently in spontaneous speech). The first number denotes the beginning of the segment in samples counted from the beginning of the speech file; the second number the duration of the segment in samples.
    The conventions of labeling and segmentation is briefly described here.

    Example:

    SAP:	549	867	0	Q%<
    SAP:	1416	1242	0	aU
    SAP:	2658	1136	0	f
    SAP:	3794	408	1	v
    SAP:	4202	852	1	i:
    SAP:	5054	433	1	d
    SAP:	5487	1686	1	6%>
    SAP:	7173	828	1	h%<%>
    SAP:	8001	864	1	2:-9%<%>
    SAP:	8865	1015	1	r-6%<
    SAP:	9880	0	1	@-
    SAP:	9880	1732	1	n
    

  9. Automatic Phonetic Segmentation by MAUS MAU: Class 4

    Definition:

    MAU: (begin) (duration) (list of symbolic links) (label string)

    This tier contains an automatically generated phonetic-phonologic segmentation in units of SAM-PA. Some of these tiers are produced in close cooperation with Technical University of Munich (Dr. G. Ruske).
    A detailed description of the MAUS system can be found here.

    The first number is the start of the segment counted in samples from the beginning of the file; the second number is the length of the segment in samples.
    The segmentation is justified and has no relation to the tier 'Vorschlagstranskription' as done in the tier SAP. (however, there are symbolic links to the words).
    The units are extented German SAM-PA. Additional labels are <nib> (non-speech event) and <p:> (pause). These labels always get the symbolic link -1 (no link).
    Furthermore, events that clearly stem from the speaker, but cannot be classified (e.g. non-understandable words) are labelled and segmented as <usb>. The latter receive a symbolic link as other word events.

    Example:

    MAU: 0 676 -1 <p:>
    MAU: 677 7861 -1 <nib>
    MAU: 8539 450 0 g
    MAU: 8990 2436 0 u:
    MAU: 11427 1740 0 t
    MAU: 13168 958 1 d
    MAU: 14127 1298 1 a
    MAU: 15426 3820 1 n
    MAU: 19247 303 2 n
    MAU: 19551 1785 2 e:
    MAU: 21337 624 2 m
    MAU: 21962 636 2 n
    MAU: 22599 501 3 v
    

  10. Word Segmentation WOR: Class 4

    Definition:

    WOR: (begin) (duration) (list of symbolic links) (label string)

    This tier contains a segmentation of the utterance in word or word equivalents. The segmentation need not to be justified. The 'label string' may contain othographic or pronunciation information (eg. in SAM-PA). A '-' at the end of 'label string' denotes a missing word in reference of the tier KAN. A '-' a last character in 'label string' denotes an inserted word.
    The symbolic links give the relation to the KAN tier. Note that inserted words have a symbolic link to the previous word in the KAN tier.

  11. Dialog Act Segmentation DAS: Class 1

    Definition:

    DAS: (list of symbolic links) (marker string)

    This tier contains a segmentation in dialog acts according to the ongoing work of the 'Deutsches Forschungszentrums für künstliche Intelligenz', Saarbrücken, Germany (DFKI). Each marker covers a portion of the speech signal that is denoted by the symbolic links to the reference tier KAN.

    Example:

    DAS: 0,1,2,3,4,5 @(SUGGEST_SUPPORT_DATE BA)
    DAS: 6,7,8,9 @(DELIBERATE_EXPLICITE BA)
    DAS: 10,11,12,13,14,15,16,17,18,19,20 @(SUGGEST_SUPPORT_DATE BA)
    
    In this example the marker SUGGEST_SUPPORT_DATE covers the words 0 to 5 in the reference tier. The term 'BA' denotes a dialog act from speaker 'B' to speaker 'A', where speaker 'A' is always the speaker that initiates the dialog.
    A more detailed description of the markers and the principles of segmentation can be found here.

  12. Prosodic Segmentation PRB: Class 5

    Definition:

    PRB: (sample) (list of symbolic links) (marker string)

    This tier contains the prosodic segmentation (by hand) according to GTobi done by the Technical University of Braunschweig.
    The first number gives the time of the prosodic event measured in samples from beginning of the file.
    The symbolic links give the relation to the KAN tier.
    The label string describes the prosodic event itself. A concise description of the labeling convention (GTobi) can be found
    here (Sorry: only in German).

    Example:

    PRB:    54212    5   TON: H*; FUN: NA
    PRB:    63269    7   TON: L+H*; FUN: EK
    PRB:    76371    8   BRE: B3; TON: L-L%
    PRB:    79967    8   TON: L*+H; FUN: PA
    

  13. Sybolic prosodic segmentation PRS: Class 1

    Definition:

    PRS: (list of symbolic links) (marker string)

    This tier contains a symbolic prosodic segmentation and labeling (by hand) into 3 boundary markers and 3 accent markers (close to GTobi).

    The symbolic links give the relation to the word event order.
    The label string describes the prosodic event itself. Boundary markers (B3, B2, B9) are linked to two words acting as left and right neighbors of the boundary. Accent markers (PA, NA, EK) refer to the word where the accent was labeled. No syllable information is provided.
    Definition of Marker Strings:
    B3 : full intonational boundary with strong intonational marking, often with pauses or lengthening or change of speed
    B2 : intermediate phrase boundary with weak marking, weaker intonational marking than B3
    B9 : 'agrammatical' boundary, e.g. hesitations, repairs, unintended pauses
    PA : main accents (phrase accent) carried by one word; in rare cases there can be two or more words marked together
    NA : secondary accent for accentuated words without PA
    EK : emphatic or contrastive accents

    Example:

    PRS:    0       EK
    PRS:    4;5     B2
    PRS:    7       NA
    PRS:    9       NA
    PRS:    11      NA
    PRS:    11;12   B3
    PRS:    13      EK
    PRS:    14      EK
    PRS:    15      PA
    PRS:    17      NA
    PRS:    17;18   B2
    PRS:    18      NA
    PRS:    19;20   B3
    PRS:    23      EK
    PRS:    23;24   B3
    PRS:    25      EK
    PRS:    27      PA
    

  14. Noise Labeling NOI: Class 1

    Definition:

    NOI: (single or pair of symbolic links) (marker string)

    This tier contains a noise labelung in reference to the word chain defined. Two different types of noises are possible: simple noise occuring between two words are denoted with a semi-colon seperated pair of symbolic links to these wrds (e.g. '5;6'); noise that superimpose a single word is marked with a single symbolic link denoting the superimposed word (e.g. '5').
    The marker string contains a blank seperated list of noise labels. The labels are drawn from the VMII TRL transliteration format:

    <A> <B>                 : Breathing
    <P>                           : distinct silence within an utterance
    <%>                           : not understandable muttering
    Schmatzen> <Smack>         : lip smack
    <Schlucken> <Swallow>   : swallow
    <R"auspern> <Throat>    : throat clear
    <Husten> <Cough>        : cough
    <Lachen> <Laugh>        : laugh
    <Ger"ausch> <Noise>     : other articulatory noise
    <#Klopfen> <#Knock>     : knock
    <#Rascheln> <#Rustle>   : rustle
    <#Quietschen> <#Squeak> : creak
    <#Klicken> <#Click>     : click noise
    <#Mikrowind>                  : blowing into microphone 
    <#Mikrobe>                    : noise caused by touching, knocking,
                                          rubbing against the microphone
    <#>                           : other technical noise
    

    For example:

    NOI:    5       <Lachen>          # word 5 is superimposed by a laugh
    NOI:    5;6     <A>               # between word 5 and word 6 a distinct
                                      # breathing was recorded
    

  15. Signal-based prosodic accent labeling LBP: Class 3

    Definition:

    LBP: (sample) (marker string)

    This tier contains a manually labeled accent marker according to GTobi. There is no link to the word order. The labeling was done during the German Verbmobil 2 project by the Technical University of Braunschweig.
    The following three accent classes were used:

    PA    phrase accent
    NA    secondary accent
    EK    emphatic or contrastive accents
    
    For example:
    LBP: 1651 PA
    

  16. Signal-based prosodic boundary labeling LBG: Class 3

    Definition:

    LBG: (sample) (marker string)

    This tier contains a manually labeled accent marker according to GTobi. There is no link to the word order. The labeling was done during the German Verbmobil 2 project by the Technical University of Braunschweig.
    The following 5 boundary classes were used:

    B9    irregular boundary: 'agrammatical' boundary, e.g. hesitations,
          repairs, unintended pauses
    B2    intermediate phrase boundary with weak marking, weaker intonational
          marking than B3
    B3    intonational boundary with strong intonational marking, no
          question
    B3QH  B3, sematically a question, with high tone
    B3QL  B3, sematically a question, with low tone
    
    For example:
    LBG: 6586 B3
    

  17. Syntactic-prosodic labeling PRO: Class 1

    Definition:

    PRO: (sybolic link) (marker string)

    This tier contains a manually labeled prosodic accent and boundary annotation based on the linguistic information (the chain of words). Consequently it only contains links to the spoken words of the utterance but not to the signal itself. The labeling was done during the German Verbmobil 2 project by the Technical University of Erlangen in cooperation with the University of Munich.

    A detailed description of the labeling system as well as the used categories can be found here (for German) (definition of labels can be found in table 12 on pp. 15-16 of the document) and here (for English).

    For example:

    PRO: 6;7        SS2
    PRO: 13;14      AC1
    PRO: 14;15      AC1
    PRO: 15;16      AC1
    PRO: 18;19      SC3
    PRO: 24;25      IRB
    PRO: 25;26      AC1
    PRO: 26;27      AC1
    PRO: 27;28      AC1
    PRO: 28;29      IWE
    PRO: 28;29      IZB
    PRO: 31         SM3
    

  18. Syntactic trees SYN: FUN: LEX: Class 1

    Definition:

    SYN: (sybolic link) (marker string)

    FUN: (sybolic link) (marker string)

    LEX: (sybolic link) (marker string)

    This tier contains a computer-readable representation of a syntactic tree of the utterance. The tiers SYN, FUN and LEX are describing different aspects of this tree, such as syntactic node, function and word class (see below). They may also be exploited independently. The labeling was done during the German Verbmobil 2 project by the University of Tübingen.

    An overview about the treebanks of Verbmobil II (6 pages) can be found here,

    A detailed description of the labeling system as well as the used categories can be found here for German, English and Japanese .

    Representation of Syntax Trees in the BAS Partitur Format (BPF)
    ===============================================================
    
    In the BAS Partitur Format the syntax trees are represented in three
    tiers. The terminal (lexical) categories are listed in the LEX
    tier. Syntactical categoies of higher orders are listed in the SYN
    tier. Grammatical functions refering to both LEX and SYN are
    listed in the FUN tier. The LEX and the SYN entries refer to the nodes
    and FUN represents the edges of the syntax tree.
    
    
    Lexical Categories:
    -------------------
    
    Definition:
    
    LEX: (symbolic link) (label string)
    
    This tier represents the lexical categories of the words. The words
    are represented by symbolic links. Hesitations, neologisms and
    unintelligible parts of an utterance have not been annotated.
    
    Example:
    
    LEX:    0               0       PDS
    LEX:    1               0       VMFIN
    LEX:    2               0       CARD
    LEX:    3               0       NN
    LEX:    4               0       ADJD
    LEX:    5               0       VVINF
    
    The label string contains 
    
    (1) a tag for the lexical category, e.g. CARD (cardinal number) for word 2.
    (2) an index indicating whether the node is terminal or branching or
    non-branching. The LEX tier represents only terminal nodes therefore
    the index is always 0 (see SYN and FUN tier for further information 
    to this index). 
    
    
    LEX labels used in syntax trees of German dialogues:
    
    UNKNOWN unknown tag 
    --      
    ADJA    attributive adjective
    ADJD    adverbial or predicative adjective
    ADV     adverb
    APPR    preposition; circumposition left
    APPRART preposition with article
    APPO    postposition
    APZR    circumposition right
    ART     definite or indefinite article
    CARD    cardinal number
    FM      foreign language material
    ITJ     interjection
    KOUI    subordinating conjunction with "zu" and infinitive
    KOUS    subordinating conjunction with sentence
    KON     coordinative conjunction
    KOKOM   particle of comparison, no clause
    NN      noun
    NE      proper noun
    PDS     substituting demonstrative pronoun
    PDAT    attributive demonstrative pronoun
    PIS     substituting indefinit pronoun
    PIAT    attributive indefinit pronoun without determiner
    PIDAT   attributive indefinit pronoun with determiner
    PPER    irreflexive personal pronoun
    PPOSS   substituting possessive pronoun
    PPOSAT  attributive possessive pronoun
    PRELS   substituting relative pronoun
    PRELAT  attributive relative pronoun
    PRF     reflexive personal pronoun
    PWS     substituting interrogative pronoun
    PWAT    attributive interrogative pronoun
    PWAV    adverbial interrogative oder relative pronoun
    PAV     (replaced by PROP)
    PTKZU   "zu" + infinitive
    PTKNEG  negation particle
    PTKVZ   seperated verb particle
    PTKANT  answer particle
    PTKA    particle with adjective or adverb
    TRUNC   truncated word - first part
    VVFIN   finite main verb
    VVIMP   imperative, main verb
    VVINF   infinitive, main verb
    VVIZU   infinitive with "zu", main
    VVPP    past participle, main
    VAFIN   finite verb, aux
    VAIMP   imperative, aux
    VAINF   infinitive, aux
    VAPP    past participle, aux
    VMFIN   finite verb, modal
    VMINF   infinitive, modal
    VMPP    past participle, modal
    XY      non-word containing special characters
    $,      comma
    $.      sentence-final punctuation
    $(      sentence internal punctuation marks
    PROP    NEW: pronominal adverb ("dafür")
    BS      letter (e. g. spelling)
    
    
    LEX labels used in syntax trees of English dialogues:
    
    UNKNOWN        unknown tag
    --             
    CC             Coordinating conjunction
    CD             Cardinal number
    DT             Determiner
    EX             Existential there
    FW             Foreign word
    IN             Preposition or subordinating conjunction
    JJ             Adjective
    JJR            Adjective, comparative
    JJS            Adjective, superlative
    LS             List item marker
    MD             Modal
    NN             Noun, singular or mass
    NNS            Noun, plural
    NP             Proper noun, singular
    NPS            Proper noun, plural
    PDT            Predeterminer
    POS            Possessive ending
    PP             Personal pronoun
    PP$            Possessive pronoun
    RB             Adverb
    RBR            Adverb, comparative
    RBS            Adverb, superlative
    RP             Particle
    SYM            Symbol
    TO             to
    UH             Interjection
    VB             Verb, base form
    VBD            Verb, past tense
    VBG            Verb, gerund or present participle
    VBN            Verb, past participle
    VBP            Verb, non-3rd person singular present
    VBZ            Verb, 3rd person singular present
    WDT            Wh-determiner
    WP             Wh-pronoun
    WP$            Possessive wh-pronoun
    WRB            Wh-adverb
    ,              Comma
    .              Sentence-final punctuation
    ____________________________________________________________________________
    
    
    Syntactical Categories:
    -----------------------
    
    Definition:
    
    SYN: (list of symbolic links) (label string)
    
    This tier contains syntactical categories of constituents of phrases,
    topological fields and clauses. Analogous to the LEX tier hesitations,
    neologisms and unintelligible parts of an utterance have not been
    annotated. Therefore it is possible that some turns have a LEX and a FUN tier,
    but do not have a SYN tier.
    
    Example:
    
    SYN:    0               1       NX
    SYN:    0               2       VF
    SYN:    0,1,2,3,4,5     0       SIMPX
    SYN:    1               1       VXFIN
    SYN:    1               2       LK
    SYN:    2               1       ADJX
    SYN:    2,3             0       NX
    SYN:    2,3,4           0       MF
    SYN:    4               1       ADJX
    SYN:    5               1       VXINF
    SYN:    5               2       VC
    
    Each label string contains two kinds of information:
    
    (1) The syntactical category of a constituent that spans over the
    words represented by the list of symbolic links. Thus the words 2 and
    3 belong to the nominal phrase NX, which again is part of the middle
    field MF that is finally part of the Simplex clause SIMPX.
    
    (2) An index indicating whether the node is terminal or branching or
    non-branching. Branching nodes as well as the terminal nodes of the
    LEX tier get the index 0. For non-branching nodes the numbers are
    incremented by 1 for each level. Thus the position of a node in the
    syntax tree is unambiguously defined. 
    
    
    SYN:          _____________________SIMPX_____________
                 /        /              |               \ 
    SYN:        /        /            __MF(0)__           \
               /        /            /         \           \
    SYN:     VF(2)    LK(2)        NX(0)        \         VC(2)    
              |        |         /      \        |          |
    SYN:     NX(1)  VXFIN(1)  ADJX(1)    |     ADJX(1)   VXINF(1)
              |        |        |        |       |          |
    LEX:    PDS(0)  VMFIN(0)  CARD(0)  NN(0)   ADJD(0)   VVINF(0)
    
    symbolic   0        1        2        3       4          5  
    links
    
    For word 2 in the LEX tier the index is 0, because CARD is a terminal
    node. For the node VXFIN the index is incremented by 1. As the node
    CARD it only refers to word 2 and is therefore non-branching. For node
    LK the index is incremented by 1 for the same reason. The index of
    node SIMPX is 0, because it has a branch to LK but also several
    branches to other nodes. The the edge label positions in the syntax
    tree of the FUN tier can be obtained in a similar way.
    
    
    SYN labels used in syntax trees of German dialogues:
    
    --       (must always be "--")
    NX      noun chunk
    PX      prepositional phrase
    SIMPX   simplex clause
    VXFIN   finite verb phrase
    MF      middle field (Mittelfeld)
    VC      verb complex (Verbkomplex)
    NF      final field (Nachfeld)
    LK      left sentence bracket (Linke Satzklammer)
    VF      initial field (Vorfeld)
    ADVX    adverbial chunk
    ADJX    adjectival chunk
    P-SIMPX paratactic construction of simplex clauses (Parataktische Verknuepfung zweier SIMPX)
    R-SIMPX Relativsatz
    VXINF   infinite verb phrase
    DM      discourse marker
    MVC     conjunct consisting of MF and VC (Konjunkt, bestehend aus MF und VC)
    PARORD  field of non-coordinative particles (Feld f. nicht-koord. beiordnende Partikeln) (V2)
    C       complementizer field (Feld f. Komplementierer bei Verb-letzt-Saetzen)
    KOORD   field of coordinative particles (Feld f. koordinierende Partikeln (und, oder, aber usw.))
    LV      topological field for resumptive construction (topologisches Feld fuer Linksversetzungen)
    LKMVC   conjunct consisting of LK, MF and VC (Konjunkt, bestehend aus LK, MF, VC)
    LKM     conjunct consisting of LK, MF (Konjunkt, bestehend aus LK, MF)
    MVCN    conjunct consisting of MF, VC and NF (Konjunkt, bestehend aus MF, VC, NF)
    MN      conjunct consisting of MF and NF (Konjunkt, bestehend aus MF, NF)
    DP      determiner phrase (e.g. "gar keine")
    KONX    complex of conjuncts (Konjunktionskomplex ("und zwar" in VF))
    VLKM    conjunct consisting of VF, LK and MF (Konjunkt, bestehend aus VF, LK, MF)
    VLKMVC  conjunct consisting of VF, LK, MF and VC (Konjunkt, bestehend aus VF, LK, MF, VC)
    LKMVCN  conjunct consisting of LK, MF, VC and NF (Konjunkt, bestehend aus LK, MF, VC, NF)
    LKMN    conjunct consisting of LK, MF and NF (Konjunkt, bestehend aus LK, MF, NF)
    LKVCN   Konjunkt, bestehend aus LK, VC, N
    VCN     Konjunkt, bestehend aus VC und N
    FKOORD  coordination consisting of conjuncts of fields (komplexe Felderkoordination)
    LKN     Konjunkt, bestehend aus LK und N
    CMVCN   Konjunkt, bestehend aus C, MF, VC und NF
    
    
    SYN labels used in syntax trees of English dialogues:
    
    --       (must always be "--")
    AP      Adjective Phrase
    APS     Adj-headed sm.clause
    ADVP    Adverb Phrase
    ADVPD   Adverb DATE-Phrase
    CMP     Complementizer
    CMP-WH  Complementizer,WH-
    CNJ     Conjunction(single)
    CNJ1    Conjunction(1 of 2)
    CNJ2    Conjunction(2 of 2)
    DG      Degree(non-wh)
    DG-WH   Degree-WH(how...)
    DGP     Degree Phrase
    DT-ART  Det,Article(the,a)
    DT-DM   Det,Demonstrative
    DT-QNT  Det,Quantifier(every)
    DT-R    Det,Rel.clause
    DT-WH   Det,Wh-(which,whose)
    DTP     Det.Phrase
    N       Noun,Common
    -        do not use this
    CNUM    N,Cardinal Number
    ONUM    N,Ordinal Number
    NP      Noun Phrase
    NPS     Noun-headed sm.clause
    NPD     Noun DATE-phrase
    NPT     Noun TIME-phrase
    PR-DM   PR,Demonstrative
    PR-WH   PR,WH-
    PR-R    PR,Relative
    PP      Prepositional Phrase
    PPS     Prep-headed sm.clause
    SUGG    Suggestion("How about Tuesday?")
    S       Sentence(VP w/subject)
    V-G     Verb,gerund
    V-PRP   Verb,present participle
    V-PSS   Verb,passive participle
    VP      Verb Phrase(S if sub Vs sister)
    
    
    Grammatical Functions:
    ----------------------
    
    Definition:
    
    FUN: (list of symbolic links) (label string)
    
    The FUN tier contains the grammatical functions that refer to the
    syntactical and lexical categories listed in the SYN and LEX tier.
    
    
    Example:
    
    FUN:    0               0       HD
    FUN:    0               1       ON
    FUN:    0               2       -
    FUN:    0,1,2,3,4,5     0       --
    FUN:    1               0       HD
    FUN:    1               1       HD
    FUN:    1               2       -
    FUN:    2               0       HD
    FUN:    2               1       -
    FUN:    2,3             0       V-MOD
    FUN:    2,3,4           0       -
    FUN:    3               0       HD
    FUN:    4               0       HD
    FUN:    4               1       MOD
    FUN:    5               0       HD
    FUN:    5               1       OV
    FUN:    5               2       --
    
    Label string contains the grammatical functions of the word or the
    constituent in the syntax tree (see LEX and SYN tier) that has the
    same list of symbolic links and the same index. Word 3 as part of the
    constituent NX (see SYN tier) has the function HD (head) and NX has
    the Function of V-MOD (Modifier of a Verb). 
    
    
    FUN labels used in syntax trees of German dialogues:
    
    --	 Not bound
    HD       Head
    ON       Nominative object(=subject)
    -        Shalt not be bound
    OD       Dative object
    MOD      Ambiguous modifier
    ON-MOD   Modifier of subjects
    OA-MOD   Modifier of accusative objects
    OD-MOD   Modifier of dative objects
    OPP      Prepositional object (obligatorisches PP-Objekt)
    OV       Verbal objekt
    ONK      Nominativ-Objekt-Konjunkt
    OAK      Akkusativ-Objekt-Konjunkt
    VPT      Seperable verb prefix
    MOD-MOD  Modifier of other Modifier
    APP      Apposition
    -        Not bound
    PRED     Predicate
    OA       Accusative object
    V-MOD    Modifier of a Verb
    V-MODK   Konjunkt des Verb-Modifikators
    OPP-MOD  Not bound
    PRED-MOD Mod. of predicate
    FOPP     Optional prepositional object
    OS       Sentential object
    OADVP    Adverbial object
    FOPP-MOD Modifier of optional prepositional object
    OADJP    ADJP object
    OADVPMOD Modifier of ADVP object
    OADJPK   Konjunkt des ADJP-Objekt-Modifikators
    FOPPK    fakul. PP-Objekt-Konjunkt
    PREDK    Praedikativ-Konjunkt
    MOD-MODK        Konjunkt des modif. Modifikators
    MODK     nicht-eind. Modifikator-Konjunkt
    OPP-MODK        Konjunkt d. obl. PP-Objekts
    PREDMODK        Konjunkt d. Praedikativs
    OPPK    obligatorisches PP-Objekt-Konjunkt
    OADVPK  Konjunkt des ADVP-Obj.-Modif.
    
    
    FUN labels used in syntax trees of English dialogues:
    
    --      Not bound
    HD      Head
    COMP    Complement
    SPR     Specifier
    SBJ     Subject
    SBQ     Subject,WH-
    SBR     Subject,REL
    ADJ     Adjunct
    ADJ?    Adjunct?
    FLL     Filler
    FLQ     Filler,WH-
    FLR     Filler,REL
    MRK     Marker
    -       For intentionally empty edge labels
    
    
    The annotations were prepared in the NeGra format by the University of
    Tübingen and have afterwards been converted into th partitur
    format. It is possible that this process caused little changes.
    To view the syntax trees the partitur files have to be converted into
    the NeGra format by the perl program "bas2negra.pl" (included in the
    standard BAS software package on each BAS CDROM). 
    The Java program TIGERSearch that has been developed by Wolfgang
    Lezius during the TIGER project at the IMS Stuttgart can be used to
    search and visualize the trees. From autumn 2001 on TIGERSearch can be
    downloaded from the following Website:
    
    http://www.ims.uni-stuttgart.de/projekte/TIGER/
    

    For example:

    SYN:    0       1       DM
    SYN:    1       1       NX
    SYN:    1       2       VF
    SYN:    1,2,3,4,5       0       SIMPX
    SYN:    2       1       VXFIN
    SYN:    2       2       LK
    SYN:    3       1       ADVX
    SYN:    3,4,5   0       MF
    SYN:    4       1       NX
    SYN:    5       1       ADVX
    SYN:    7       1       VXFIN
    SYN:    7       2       LK
    SYN:    7,8,9,10,11     0       SIMPX
    SYN:    8       1       NX
    SYN:    8,9,10,11       0       MF
    SYN:    9,10,11 0       NX
    SYN:    10      1       NX
    SYN:    10,11   0       NX
    SYN:    11      1       NX
    FUN:    0       0       -
    FUN:    0       1       --
    FUN:    1       0       HD
    FUN:    1       1       ON
    FUN:    1       2       -
    FUN:    1,2,3,4,5       0       --
    FUN:    2       0       HD
    FUN:    2       1       HD
    FUN:    2       2       -
    FUN:    3       0       HD
    FUN:    3       1       MOD
    FUN:    3,4,5   0       -
    FUN:    4       0       HD
    FUN:    4       1       OA
    FUN:    5       0       HD
    FUN:    5       1       V-MOD
    FUN:    7       0       HD
    FUN:    7       1       HD
    FUN:    7       2       -
    FUN:    7,8,9,10,11     0       --
    FUN:    8       0       HD
    FUN:    8       1       ON
    FUN:    8,9,10,11       0       -
    LEX:    0       0       PTKANT
    LEX:    1       0       PPER
    LEX:    2       0       VAFIN
    LEX:    3       0       ADV
    LEX:    4       0       NN
    LEX:    5       0       ADV
    LEX:    7       0       VVFIN
    LEX:    8       0       PPER
    LEX:    9       0       ART
    LEX:    10      0       NN
    LEX:    11      0       NE
    
  19. Parts of Speech POS: Class 1

    Definition:

    POS: (sybolic link) (marker string)

    This tier contains an automatically generated lexical tagging of all words of the utterance. The class systemis based on the STTS (Stuttgart-Tübingen-TagSet) like the LEX tier (but the LEX tier was annotated manually!). The labeling was done during the German Verbmobil 2 project by the Technical University of Stuttgart.

    A detailed description of the labeling system as well as the used categories can be found here for German (pp. 17 - 19) and English (pp. 48 - 49). Furthermore, some examples for each German category can be found here (only in German).

    For example:

    POS:    0       ITJ
    POS:    1       PPER
    POS:    2       VAFIN
    POS:    3       ADV
    POS:    4       NN
    POS:    5       ADV
    POS:    7       VVFIN
    POS:    8       PPER
    POS:    9       ART
    POS:    10      NN
    POS:    11      NE
    

  20. Lemmata LMA: Class 1

    Definition:

    LMA: (sybolic link) (marker string)

    This tier contains automatically derived lemmas for each word in the BPF. The labeling was done during the German Verbmobil 2 project by the Technical University of Stuttgart.

    For example:

    LMA:    0       nein
    LMA:    1       pper
    LMA:    2       haben
    LMA:    3       hier
    LMA:    4       Unterlage
    LMA:    5       da
    LMA:    7       kennen
    LMA:    8       pper
    LMA:    9       d
    LMA:    10      Hotel
    LMA:    11      Maritim
    
    Please note that all personal pronomina were annotated with 'pper' and all articles were annotated with 'd'.

  21. Phonetic Segmentation IPA IPA: Class 2

    Definition:

    IPA: (begin) (duration) (label string)

    This tier contains a phonetical segmentation and labeling according to IPA.
    The first number denotes the beginning of a segment counted in samples from the beginning of the file, the second number denotes the duration of the segment in samples. The remainder of the line must contain a list of comma-separated IPA numbers (at least one), optionally followed by a list of corresponding SAM-PA symbols.
    IPA chart with IPA numbers
    IPA chart with symbols

    For example:

     IPA:    4856    1228    322     @
     IPA:    10629   564     317
     IPA:    11805   991     319     I
     IPA:    12797   1142    138     C
     IPA:    13940   1534    302     e
     IPA:    15475   895     110     g
     IPA:    16371   777     322     @
     IPA:    17149   758     155     l
     IPA:    17908   1497    305
     IPA:    19406   1204    116     n
     IPA:    20611   589     104     d
     IPA:    21201   1018    322     @
     IPA:    22220   1185    103     t
     

  22. Segmentation in turns/sentences/chunks/etc TRN: Class 4

    Definition:

    TRN: (begin) (duration) (symbolic link) (label string)

    This tier contains a segmentation of longer recordings into turns, sentences or similar longer events, that contain more than one word.
    The first number denotes the beginning of a segment counted in samples from the beginning of the file, the second number denotes the duration of the segment in samples. The symbolic link contains a list of comma separated word numbers that are contained in the segment. The rest of the line may contain an optional label (e.g. a turn number).

    For example:

    TRN:    132736  144640  0,1,2,3,4,5,6,7 002
    

  23. SmartKom/SmartWeb Transliteration TRS class 1

    Synopsis:

    TRS: (list of symbolic links) (transliteration)

    The tier 'Smartkom Transliteration' contains the transliteration of a whole Man Machine Dialogue recorded in the SmartKom data collection. For more background information about the SmartKom data collection see here.
    A detailed description of the underlying transliteration format can be found here.
    The transliteration is segmented into the units of the KAN tier (see above) by starting a new line after each unit. Exceptions are punctuations and pronunciation comments that are kept together with the last unit (this is just for a better readability).

    Beispiel:

    TRS:    0       <:<#> ja:> [NA] [B2] ,
    TRS:    1       ich
    TRS:    2       h"atte
    TRS:    3       <:<#> gern:> [NA]
    TRS:    4       +/die/+ [B9] <P>
    TRS:    5       die
    TRS:    6       Sehensw"urdigkeiten [PA]
    TRS:    7       von
    TRS:    8       ~Heidelberg <!1 Heidelber'> [NA] [B3 fall] .
    TRS:    9       gibt [NA]
    TRS:    10      es
    TRS:    11      hier
    TRS:    12      vielleicht
    TRS:    13      Cafeterias [PA] [B3 rise] ? <#>
    TRS:    14      was
    TRS:    15      f"ur
    TRS:    16      Hotels [NA]
    TRS:    17      gibt [PA]
    TRS:    18      es [B3 cont] ?
    TRS:    19      @1mhm [NA] [B3 cont] .
    TRS:    20      kannst <!1 kanns'>
    TRS:    21      was
    TRS:    22      andres [PA]
    
    The same tier was also used in the German SmartWeb Project. See TRW tier.

  24. SmartKom Gesture Labeling GES class 2

    Synopsis:

    GES: (begin) (duration) (label string)

    This tier contains a manual segmentation and annotation of 2D gestures as recorded in the SmartKom data collection. All gestures that occur within the range of the SIVIT camera are labelled. Additionally, emotional gestures that occur elsewhere are labeled. For more background information about the SmartKom data collection see here.
    The first number denotes the begin of a gestural event in samples from the beginning of the recording (in SmartKom: 16 kHz); the second number the duration in samples.
    The 'label string' consists of 8 columns separated by TAB, optionally followed by a comment string:

    For a detailed description of the labeling system see here; the following is a brief description of the 8 label categories (possible values of labels are quoted in ''):

    Example:

    GES:    1072000 23040   I-Geste I - tipp +      Zeige li Hand           links oben      Treffer 1078400 12160
    GES:    1959680 114560  R-Geste R - emot -      re Hand                         1078400 12160   "Uberlegung/Nachdenken
    GES:    2166400 16000   I-Geste I - tipp +      Zeige li Hand           links oben      rechts  2171520 7680
    GES:    2641280 12800   I-Geste I - tipp +      Zeige re Hand    § Schlo"s       rechts unten    Treffer 2647680 5120
    GES:    3093120 14080   I-Geste I - tipp +      Zeige re Hand           links unten     Treffer 3098240 7040
    GES:    3351680 7040    R-Geste R - UFO re Hand                         3098240 7040
    GES:    4029440 22400   I-Geste I - tipp +      Zeige li Hand           links oben      rechts  4035840 10240
    

  25. SmartKom User State Annotation (holistic) USH class 2

    Synopsis:

    USH: (begin) (duration) (label string)

    This tier type contains information on user-states (interesting emotional and cognitive states) that occured in a SmartKom recording session. For more background information about the SmartKom data collection see here.

    The whole session is segmented (no gaps). For each segment begin (begin) and duration (duration) are given in samples from the beginning of the recording (SmartKom: 16kHz).

    In the label string (label string) each segment is assigned to one of the labels described below, optionally followed by a TAB-separated rating. For a detailed description of the labeling system see here; the following is a brief description of the 7 label categories (the verbose values of labels are quoted in ''):

    1. neutral 'Neutral'
    2. joy/gratification (being successful) 'Freude/Erfolg'
    3. anger/irritation '"Arger/Mi"serfolg'
    4. helplessness 'Ratlosigkeit'
    5. pondering/reflecting '"Uberlegen/Nachdenken'
    6. surprise '"Uberraschung/Verwunderung'
    7. unidentifiable episodes 'Restklasse'
    The labels are assigned with respect to the impression of the labeler. Not only the facial expression but also the voice quality or other contextual information is considered. Only the use of words with emotional content, but without an emotional expression is NOT considered as an indicator of a respective emotion/user-state.

    The intensity of a user-state is given after the label classes 2-6 by the following rating:

    Example:

    USH:    0       205440  Freude/Erfolg   schwach
    USH:    205440  30720   Neutral
    USH:    236160  37760   Freude/Erfolg   schwach
    USH:    273920  192000  Neutral
    USH:    465920  78720   "Uberlegen/Nachdenken    stark
    USH:    544640  295680  Neutral
    USH:    840320  49920   "Arger/Mi"serfolg schwach
    USH:    890240  42880   Neutral
    USH:    933120  21760   "Uberraschung/Verwunderung       schwach
    USH:    954880  97920   Ratlosigkeit    schwach
    USH:    1052800 542720  Neutral
    
    See also tiers USM, USP and OCC.

  26. SmartKom User State Annotation (facial expression) USM class 2

    Synopsis:

    USM: (begin) (duration) (label string)

    This tier type contains information on user-states (interesting emotional and cognitive states) that occured in a SmartKom recording session. In contrast to the USH tier only the video signal of the face is available.
    For more background information about the SmartKom data collection see here.

    The whole session is segmented (no gaps). For each segment begin (begin) and duration (duration) are given in samples from the beginning of the recording (SmartKom: 16kHz).

    In the label string (label string) each segment is assigned to one of the labels described below, optionally followed by a TAB-separated rating. For a detailed description of the labeling system see here; the following is a brief description of the 7 label categories (the verbose values of labels are quoted in ''):

    1. neutral 'Neutral'
    2. joy/gratification (being successful) 'Freude/Erfolg'
    3. anger/irritation '"Arger/Mi"serfolg'
    4. helplessness 'Ratlosigkeit'
    5. pondering/reflecting '"Uberlegen/Nachdenken'
    6. surprise '"Uberraschung/Verwunderung'
    7. unidentifiable episodes 'Restklasse'
    The labels are assigned with respect to the impression of the labeler. ONLY the facial expression but NOT the voice quality or other contextual information is considered. This annotation was performed by a different labeler group than the USH annotation. Therefore this annotation may be used for a investigation of influence of speech input to user stae judgements.

    The intensity of a user-state is given after the label classes 2-6 by the following rating:

    Example:

    USM:    0       205440  Freude/Erfolg   schwach
    USM:    205440  30720   Neutral
    USM:    236160  37760   Freude/Erfolg   schwach
    USM:    273920  192000  Neutral
    USM:    465920  78720   "Uberlegen/Nachdenken    schwach
    USM:    544640  295680  Neutral
    USM:    840320  49920   "Arger/Mi"serfolg schwach
    USM:    890240  42880   Neutral
    USM:    933120  119680  "Uberlegen/Nachdenken    schwach
    USM:    1052800 542720  Neutral
    USM:    1595520 59520   "Uberlegen/Nachdenken    schwach
    USM:    1655040 157440  Neutral
    USM:    1812480 143360  "Uberlegen/Nachdenken    schwach
    USM:    1955840 58880   "Arger/Mi"serfolg stark
    USM:    2014720 89600   Neutral
    USM:    2104320 559360  "Arger/Mi"Serfolg schwach
    USM:    2663680 263680  Neutral
    USM:    2927360 28800   "Arger/Mi"serfolg schwach
    
    See also tiers USH, USP and OCC.

  27. SmartKom occlusion in the facial video OCC class 2

    Synopsis:

    OCC: (begin) (duration) (label string)

    This tier contains an additional segmentation and labeling to the SmartKom facial video recording. All occlusions of the face or part of the face by the hand, pen or other objects are segmented and classified here. This tier might be very useful fir the automatic processing of the facial video signal.

    Begin (begin) and duration (duration) of the occlusion are given in samples counted from the beginning of the recording (SmartKom: 16 kHz).
    The label string contains one of the following 10 classes:

    • 'Hand im Gesicht' : hand in face
    • 'Hand im Gesicht/Mund' : hand in face in the area of the mouth
    • 'Hand im Gesicht/Nase' : hand in face in the area of the nose
    • 'Hand im Gesicht/Augen' : Hand hand in face in the area of the eyes
    • 'Stift im Gesicht' : pen in face
    • 'Stift im Gesicht/Mund' : pen in face in the area of the mouth
    • 'Stift im Gesicht/Nase' : pen in face in the area of the nose
    • 'Stift im Gesicht/Augen' : pen in face in the area of the eyes
    • 'Teilweise nicht im Bild' : face partly not in the range of the recording camera
    • 'Objekt im Gesicht' : other object than hand or pen in the area of the face

    Example:

    OCC:    380800  18560   Teilweise nicht im Bild
    OCC:    458880  58240   Teilweise nicht im Bild
    OCC:    1167360 7680    Teilweise nicht im Bild
    OCC:    1173120 14720   Hand im Gesicht
    OCC:    1201920 11520   Teilweise nicht im Bild
    OCC:    2000000 12160   Hand im Gesicht/Mund
    OCC:    2567040 57600   Teilweise nicht im Bild
    OCC:    2709120 40960   Hand im Gesicht/Mund
    OCC:    2947840 33280   Hand im Gesicht
    OCC:    2955520 9600    Teilweise nicht im Bild
    OCC:    2981120 35840   Teilweise nicht im Bild
    OCC:    3528960 10880   Hand im Gesicht
    OCC:    4001920 10240   Hand im Gesicht
    OCC:    4103680 20480   Teilweise nicht im Bild
    
    See also tiers
    USH, USP and USM.

  28. SmartKom Meta-Linguistic Features USP class 4

    Synopsis:

    USP: (begin) (duration) (list of symbolic links) (label string)

    This tier contains a segmentation and labeling to the SmartKom audio recording. The meta-linguistic features used in this tier are the feature set for a voice based user state detection (see tier USH for details about SmartKom user state categories). The USP tier is a word-aligned extract from the original SmartKom TRP annotation files. It contains all information from the TRP files without the trouble that TRP has to be aligned to the base TRS tier first. More information regarding the TRP annotation scheme can be found here (only in German). For more background information about the SmartKom data collection see here.

    Begin (begin) and duration (duration) of the event are given in samples counted from the beginning of the recording (SmartKom: 16 kHz). Please note that in some cases NOT the event but the word in which the event takes palce are segmented. See the special notes to the individual labels below.
    The symbolic links refers to the word in question.
    The label string contains one of the following 9 classes.

    Label codes:
    (If not stated otherwise the segment is the duration of the complete word.)

    Label rules:

    Example:

    USP:    79552   6704    0       EMPHASIS
    USP:    426176  8768    6       STRONG_EMPH
    USP:    426176  8768    6       CLEAR_ART
    USP:    435952  10160   7       CLEAR_ART
    USP:    806560  6592    9       LENGTH_SYLL
    USP:    814624  4832    10      LENGTH_SYLL
    USP:    819776  17184   11      EMPHASIS
    USP:    1356896 6000    13      LENGTH_SYLL
    USP:    1785232 11808   20      LENGTH_SYLL
    USP:    1798064 7808    21      LENGTH_SYLL
    USP:    2449632 7376    23      LENGTH_SYLL
    USP:    2470016 10736   27      LENGTH_SYLL
    USP:    2470016 14800   27;28   PAUSE_WORD
    USP:    2794160 12080   31      LENGTH_SYLL
    USP:    3221632 5440    41      CLEAR_ART
    USP:    3678656 8528    48      LENGTH_SYLL
    USP:    3678656 14144   48;49   PAUSE_WORD
    USP:    3694576 3824    49      EMPHASIS
    USP:    4170960 11344   53      LENGTH_SYLL
    USP:    4186192 4464    54      EMPHASIS
    
    See also tiers USH, OCC and USM.

  29. Translation TLN class 1

    Synopsis:

    TLN: (list of symbolic links) (label string)

    This tier contains a translation of the recorded speech into another language.

    The list of symbolic links marks the area that is covered by the following translation within the recording. Translations may therefore be spread in chunks over more than one TLN line; even overlapping areas are possible, if necessary.
    The label string contains a marker giving the language pair of the translation in the form '##>%%' where '##' is the international language code for the source language while '%%' is the code for the target language. e.g. from German to English: 'DE>EN'. After this marker, separated by a single TAB follows the orthographic form of the translation without punctuation. Coding of special characters may differ as in the tier ORT (see above).

    Example:

    ORT:    0       okay
    ORT:    1       thank
    ORT:    2       you
    ORT:    3       bye
    TLN:    0,1,2,3 EN>DE    gut danke tschüs
    

  30. Prosodic labelling in 'GTobi light' PRM class 3

    Synopsis:

    PRM: (point-in-time) (label string)

    This tier contains a prosodic labelling as being used in German Speech Synthesis Projects at IMS, University of Stuttgart and at BAS, Munich. This simplified version of the German Tobi standard uses only either an accent or a boundary marker in each labelled point-in-time. This format - called 'GTobi light' - was developed by IMS Stuttgart for the usage in unit selection speech synthesis techniques. In contrast to the standard GTobi either an accent tone or a boundary type from a closed inventory may be labelled; free combinations of tone (TON:), accent type (FUN:) and boundary type (BRE:) as in GTobi is not allowed here, although some boundary markers do in fact contain information about the tone structure.

    A detailed description of the label inventory can be found in the documentation of the German BITS synthesis corpora, part B.

    Example:

    PRM:    98160   L*H
    PRM:    108665  -
    PRM:    132414  H*L
    PRM:    158400  %?
    

  31. SmartWeb Transliteration TRW class 1

    Synopsis:

    TRW: (list of symbolic links) (label string)

    This tier contains a transliteration of the German SmartWeb corpus project. It uses a subset of the SmartKom transliterations set (TRS) extended by 2 additional off-talk markers, by an pronuciation coding in SAM-PA and two additional time markers, which allow to segment the dialogue into turn-like chunks.

    the following SmartKom tags are used within TRW:

    Additional tags:

    Example:

    TRW:    0       <ZA 211.619> wurde<POT>
    TRW:    1       #zw"olf<POT>
    TRW:    2       irgendwann<POT>
    TRW:    3       von<POT> <P>
    TRW:    4       <%> . <PP>
    TRW:    5       <"ah>
    TRW:    6       's<POT>
    TRW:    7       wurde<POT>
    TRW:    8       #zw"olf<POT>
    TRW:    9       #drei"sig<POT>
    TRW:    10      von<POT>
    TRW:    11      ~Otto<POT>
    TRW:    12      dem<POT>
    TRW:    13      <%>
    TRW:    14      und<POT>
    TRW:    15      ~Heinrich<Z><POT>
    TRW:    16      irgendjemandem<POT>
    TRW:    17      gegr"undet<POT> .
    TRW:    18      ~Heinrich<POT>
    TRW:    19      der<Z><POT> ,
    TRW:    20      keine<SOT>
    TRW:    21      Ahnung<SOT> ,
    TRW:    22      und<POT>
    TRW:    23      ~Otto<POT> ,
    TRW:    24      was<SOT>
    TRW:    25      wei"s<SOT>
    TRW:    26      ich<SOT> <;ungrammatisch> . <PP>
    TRW:    27      #zw"olf<POT> , <P>
    TRW:    28      ne<OOT> . <ZE 233.342>
    

  32. Syllable Segmentation based on MAUS MAS class 5

    Synopsis:

    MAS: (begin sample) (duration sample) (list of symbolic links) (label stri ng)

    This tier contains a syllable segmentation based on the the automatic phonemic segmentation by MAUS ( see tier MAU). Starting with the transcript in SAM-PA extracted from the MAU tier we first search for minima of sonority as possible syllable boundaries between syllable nuclei, and then re-adjust these boundaries according to the rule set published by K. Kohler. The resulting syllabification is then mapped back to the MAU segmentation to obtain the start and duration of each listed syllable.

    Example:

    MAS:    53600   1920    0       'smar
    MAS:    55520   10560   0       ta
    MAS:    66080   1680    0       kUs
    MAS:    67760   11120   1       'vEl
    MAS:    78880   960     1       C@
    MAS:    79840   1600    2       'li:
    MAS:    81440   6880    2       plINs
    MAS:    88320   1600    2       'far
    MAS:    89920   1920    2       b@
    MAS:    91840   1760    3       'has
    MAS:    93600   1120    4       'du:
    MAS:    220256  480     5       m
    MAS:    220736  11040   6       'mi:6
    MAS:    231776  2560    7       'maI
    MAS:    234336  2240    7       n@
    MAS:    236576  4160    8       'fra:
    MAS:    240736  2080    8       g@
    MAS:    242816  1600    9       b@
    MAS:    244416  5440    9       'ant
    MAS:    249856  4160    9       'vO6
    MAS:    254016  2400    9       t@n
    


SAM

The SAM Format was defined in the ESPRIT "SAM" Project No 2589 : 'Speech Input and Output Assessment Methodologies and Standardization'. Only very few BAS corpora contain SAM Format files.
On each BAS CDROM you will find
scripts (sam2pho, pho2sam) for the conversion of SAM into PhonDat and vice versa.

A description of the SAM format can be found here.


AGS - Annotation Graphs

Bird et al (LDC) use an abstract and very general data model called 'annotation graphs' to represent all kinds of annotations in the ATLAS project. The BAS Partitur Format (BPF) can be represented as an annotation graph as well.
Since LDC provides also software modules for designing new annotation tools based on this model, they defined a SGML based format (based on ATLAS Level 0, v1.1b3) to store and exchange such annotation data (AGS).
On each BAS CDROM you will find the script par2ags.pl that transforms a BAS Partitur Format (BPF) file into an AGS file. A DTD for the AGS format can be found here.
Some BAS corpora are already shipped with both formats, BPF and AGS.


Florian Schiel