Possible Transcript Items

The following table gives a rough overview about possible labels and tags contained in the orthographic transcript8.8. You may review the following tags and decide which of them might be useful for your special needs. In the third column you see an example of how the items may be tagged. Of course you may also use an XML-style tagging instead.

Assume for the following list of tags that a dialogue between two or more speakers is transcribed turn by turn by listening to the signals.

Item Remarks Example
Lexical unit Standardized spelling/character coding. Define a lexical unit: words, interjections, neologisms? Lexical units are usually not tagged. station
Spelling Spelling of a word or abbreviation letter by letter. $U $S $A
Acronyms Official substitutes for words or phrases, spelled like a word OPEC
Proper names All names that cannot be translated into another language: People's names, street names, restaurants etc. ~Peter ~Marine+World
Numbers Numerals, combinations of numbers and ordinal numbers. All number written as words #three #twenty #hundred
Neologism Word that has been made up by the speaker *deliverator
Foreign Words Words that are from another language and have not been officially adopted by the main language $<$*IT$>$saluti
Off-Talk person is speaking to himself or herself and not to the partner(s) of the dialogue what$<$OOT$>$ do$<$OOT$>$ I$<$OOT$>$
Read Off-Talk Off-talk caused by reading aloud seven$<$ROT$>$
Command Words Words to operate a dialogue system !KEYComputer
Lengthening Markup of sounds within an item that are lengthened so$<$L$>$rry
Garbage words completely or partly incomprehensible $<$%$>$ three%
Truncation Item is truncated for several reasons (technical, stutter etc.) so the que= by hel= $<$*T$>$
Interruption Items may be interrupted for several reasons: pauses, breathing, hesitations etc. trans_ $<$A$>$ _lation
Missing signal Missing parts of the signal for technical reasons have to be marked in the transcript. see [*]
Slang, dialect, contractions, assimilations or mispronunciations. May either be marked in orthographic or phonetic transcript. It is important to keep the `correct' orthographic form to allow lexical mapping. no $<$!1 nope$>$ going to $<$!2 gonna$>$
Repetition Stutter of parts or complete items. like +/to/+ to see
False start Breaking off an utterance and starting a new one. -/this evening/- tomorrow
Breathing Clearly audible breathing $<$A$>$
Filled Pauses Pauses filled with vocalization or nasalization or a combination or other articulatory noises with the same intention $<$uh$>$ $<$hm$>$ $<$uhm$>$ $<$hes$>$
Empty Pause Temporary unfilled gaps in the speech. Usually not marked at the beginning or the end of a recording. $<$P$>$
Articulatory Noise Noise produced by the articulatory system of a speaker but no filled pause. $<$noise$>$ $<$cough$>$ $<$laugh$>$ $<$smack$>$ ...
Other noise Noise caused by background events, by touching the microphone, by the recording equipment etc. $<$#$>$ $<$#microtouch$>$ $<$#knock$>$ $<$#hum$>$ ...
Cross talk Overlapping speech caused by speakers interrupting each other. If information is needed about who is interrupting whom and where, this can be rather complicated. see [*]
Overlay Overlay of noise or crosstalk may be marked by using a bracket system. Recommendation: tag each overlayed word individually. here $<$:$<$#$>$ you:$>$ are
Prosody Prosodic events like emphasis, main and secondary phrase accent and boundaries may be marked up in the transcript. [PA] [NA] [B3] [B9]

