Speech of Intoxicated Speakers
Funding, application period
DFG, BAS, BADS, 2009 - 2015
The aim of this 30 + 24 months project is to analyse the speech data from the ALC corpus with regard to distictive features in respect to alcoholic intoxication. The project includes perception experiments to determine the features used by human listeners, classic phonetic feature analysis, the study of rhythm features and prosodic contours.
Speech Data: Alcohol Language Corpus
The ALC has been recorded at BAS in close cooperation with the institute of Legal Medicine, LMU Munich. The corpus is available at BAS for research and development. Members of European academic institutions have free access to the ALC via the BAS CLARIN Repository (you need a valid user account of your home institution, which must be part of AAI).
ALC contains recordings of speakers that are either intoxicated or sober. The type of speech ranges from read single digits to full conversation style. Recordings were done during drinking test where speakers drank beer or wine to reach a self-chosen level of alcoholic intoxication. The actual level of intoxication was measured by breath alcohol and blood samples taken immediately before the speech recording. Recordings were performed in two standing automobiles to ensure a constant acoustic environment across the different recording locations; both, the intoxicated and sober condition recording were done in the same car and supervised by the same investigator (dialogue partner). In the intoxicated state 30 items were sampled from each speaker (set A), while in the sober state 60 items were recorded (set NA; set A being a subset of set NA).
Numbers (edition 2.5):
- number of recorded speakers: 162
- number of validated recordings: 15180
- number of phonetic segments: 1456556
- file formats:
- headset Beyerdynamics Opus 54: WAV 44,1kHz, 16 bit
- mouse micro AKG 400: WAV 44,1kHz, 16 bit
- meta data: speaker and recording protocol (SpeechDat)
- lexicon: 7-bit ASCII
- legacy Emu database: *.hlb, *.phonetic (selected files, only headset, only one version of each prompt)
- emuR DB
- segmentation: manual segmentation of initial and final silence interval; automatic phonemic segmentation by MAUS
TextGrid (praat), BAS Partitur Format (BPF), legacy Emu, emuR annot files
ALC was the official benchmark corpus of the INTERSPEECH 2011 Speaker State Challenge.
For more information regarding the ALC please contact firstname.lastname@example.org.
Internal Wiki (password protected)
Most Prominent Results
Within this project we studied the following acoustic/phonetic features: fundamental frequency, jitter, jimmer, formant frequencies in vowel spectra, spectra of sibilants.
Fundamental frequency raises in general with intoxication; jitter and shimmer increase; vowel spectra are pretty stable, but there is a tendency (partly significant for female speakers) to expand the vowel space when intoxicated (hyper-articulation) which might be related to the awareness of being recorded or to the decreased speaking rate (speakers have more time to reach vowel targets); Spectral features of sibilants /s/ and /S/ (='esch') (spectral peak, slope differences below/above peak, energy below 500Hz, spectral moments) are partly more similar when intoxicated (peak, 1st and 3rd spectral moment) which can be caused by the impaired articulation. Also we found that voice-less /s/ tends to be more similar to voiced /z/ when intoxicated. On the other hand we found no evidence for the stereotype phoneme substitution /s/ -> /S/ in intoxicated speech.
Analysed linguistic features included speech errors and disfluncies (repetitions, false starts, word and sentence breaks, changes in speaking rate, increased rate of unfilled and filled pauses, unusual prolongations of phones, and filler sounds).
In general, with a few exceptions, the rates of disfluencies rise with intoxication in read and spontaneous speech, which is in line with most earlier studies (Schiel & Heinrich 2015). There is a tendency for a higher percentage of filled pauses in speech under the influence of alcohol compared to speech in sober condition, especially for read speech. This contradicts earlier studies who reported a decrease
in the number of filled pauses. We also found that the number and length of unfilled pauses increase in speech under the influence of alcohol (except for command&control speech); this could indicate planning difficulties when intoxicated, for which the speaker compensates by inserting silent and filled pauses. Repetitions seem to occur less often in spontaneous speech under the influence
of alcohol, which contradicts our main hypothesis that disfluency rates would increase with rising intoxication level. At the moment we do not have a conclusive explanation for this observation. The average duration of filled and unfilled pauses rises with intoxication mainly for spontaneous speech. However, these effects can partly be explained by the reduced speaking rate under the influence of alcohol (see next paragraph), with the possible exception of unfilled pause length in read speech, which shows a 19%
increase on average.
As is often the case with correlations of measurable observations against speaker states, there is a heterogeneous picture across speakers: while the majority of speakers may increase a certain linguistic/phonetic rate/measure with intoxication, other speakers decrease or do not change the same rate/measure at all (see also 'Speaker Idiosyncracies' below).
The following prosodic/rhythmic features were studied: speech rate, %V, %C, several rhythm feature proposed by Ramus et al 1999, nPVI of various flavors proposed by Grabe et al 2004, rhythmicity features (Heinrich 2011), fundamental frequency and energy contours (Heinrich 2014).
Speaking rate is generally increased when intoxicated; most classic rhythm and rhythmicity features differ significantly; enery and f0 contours were analysed with regard to global distance measures, DCT parameters, cepstral moments and fPCA. Except for fPCA most other contour features differed significantly with intoxication.
Human listeners reach only about 60-65% recognition rate in a forced choice perception experiment when presented with short chunks (2-8sec) of speech. Although simply exploiting the raised fundamental frequency in the speech stimuli would yield a (theoretical9 detection rate of about 78% listeners do not use this prominent feature. In cross experiments with compensated f0 in intoxicated speech and with simulated raised f0 on sober speech, listeners seemed to rely on other (linguistic?) features rather than f0, but used f0 as a fall back features when other features were absent. A possible explanation is that listeners do not trust fundamental frequency as a robust feature for intoxication since it is influences by many other speaker states (such as stress, happiness, anger etc.) as well (Baumeister & Schiel 2013, 2015).
With regard to all of the analysed features/perception tests we always found that speakers vary extremely in their way to express intoxication within their speech: the majority of speakers (60-80%) show clear and detectable signs of intoxication and can be easily spotted by human listeners; some speakers camouflage their intoxication almost perfectly, some even for considerably high intoxication levels; a small group of speakers show a 'reverse' behavior: their features actually move in the opposite direction when being intoxicated, i.e. they appear to be more sober when intoxicated. Whether the latter effect is caused by the recording situation, where speakers are aware of the recording and try to counteract their self-perceived intoxication, or whether this is simply a speaker dependent idiosyncracy is still under debate. For practical purposes (i.e. automatic detection of intoxication in speech) it is unlikely to develop a speaker-independent detection method (although the results of the INTERSPEECH Speaker State Challenge show that such approaches may yield detection rates up to 80%), but it is more promising to work on speaker dependent/speaker adaptive classification schemes where a large amount of 'sober' speech data of an individual is available to the system.
- Schiel F, Heinrich Chr, Barfüßer S, Gilg Th (2008). ALC - Alcohol Language Corpus. In: Proc. of LREC 2008, Marrakesch, Marokko, paper 419.
- Schiel F, Heinrich Chr (2009). Laying the Foundation for In-car Alcohol Detection by Speech. In: Proc. of the INTERSPEECH 2009, Brighton, UK, pp. 983-986.
- Schiel F, Heinrich Chr, Neumeyer V (2010). Rhythm and Formant Features for Automatic Alcohol Detection. In: Proc. of the INTERSPEECH 2010, Chiba, Japan, pp. 458-461.
- Schiel F (2011). Perception of Alcoholic Intoxication in Speech. In: Proc. of the Interspeech 2011, Florence, Italy, pp. 3281-3284.
- Heinrich Chr, Schiel F (2011). Estimating Speaking Rate by Means of Rhythmicity Parameters. In: Proc. of the Interspeech 2011, Florence, Italy, pp. 1873-1876.
- Schiel F, Heinrich Chr, Barfüßer S (2011). Alcohol Language Corpus. Language Resources and Evaluation, Volume 46, Issue 3 (2012), Berlin-New York:Springer, DOI: 10.1007/s10579-011-9139-y, pp. 503-521.
- Baumeister B, Heinrich Ch, Schiel F (2012). The Influence of Alcoholic Intoxication on the Fundamental Frequency of Female and Male Speakers. J. Acoust. Soc. Am., Volume 132, Issue 1, pp. 442-451, DOI: 10.1121/1.4726017.
- Schuller B, Steidl St, Batliner A, Schiel F, Krajewski J (2012). The INTERSPEECH 2011 Speaker State Challenge - a review. IEEE Speech and Language Technical Committee Newsletter, Winter 2012 (online).
- Baumeister, B., Schiel (2013). Human Perception of Alcoholic Intoxication in Speech. In Proceedings of the Interspeech 2013, pp 1419-1423.
- Schuller B, Steidl S, Batliner A, Schiel F, Krajevski J (2014). Introduction to the Special Issue on Broadening the View on Speaker Analysis. Computer, Speech and Language, Volume 28(Issue 2):343-345.
- Schuller B, Steidl S, Batliner A, Schiel F, Krajevski J, Weninger F, Eyben F (2014). Medium-term speaker states—A review on intoxication, sleepiness and the first challenge. Computer, Speech and Language, Volume 28(Issue 2):346-374.
- Heinrich Ch, Schiel F (2014). The influence of alcoholic intoxication on the short-time energy function of speech.
In: J. Acoust. Soc. Am. Volume 135, Issue 5, pp. 2942-2951, DOI: 10.1121/1.4859820.
- Schiel F, Heinrich Chr (2015). Disfluencies in the speech of intoxicated speakers. International Journal of Speech, Language and the Law, Volume 22.1, pp. 19-33, ISSN: 1748-8885, DOI: 10.1558/ijsll.v22i1.24767.
- Baumeister B, Schiel F (2015). Fundamental Frequency and Human Perception of Alcoholic Intoxication in Speech. In: Proc. of the International Conference on Phonetic Sciences, Glasgow, United Kingdom, paper 418.
- Barfüßer, S., Schiel, F. (2010). Disfluencies in alcoholized speech. In IAFPA Annual Conference.
- Baumeister, B., Schiel, F. (2010). On the Effect of Alcoholisation on Fundamental Frequency. In IAFPA Annual Conference.
- Schiel, F., Heinrich, Chr., Barfüßer, S., Gilg, Th. (2010). Alcohol Language Corpus -- a publicly available large corpus of alcoholized speech. In IAFPA Annual Conference.
- Schiel F (2011). ALC - Alcohol Language Corpus. Phonetics Department, University of Melbourne,Australia, 2011-04-01.