Face, Speech, and Acoustics
		Introduction · Program · Local Information · Registration · Contact

Abstract

Carol A. Fowler (Haskins Laboratories, New Haven)

Face, speech and acoustics: An overview

Research in speech science has shown a consistent progression in our perspective on speech perception both from acoustic speech signals alone and from audiovisual signals. Early on, not surprisingly, the focus of attention was on the structure in acoustic signals that provides phonetic information. Indeed, theorists, for the most part, assumed that nonlinguistic (and even nonsegmental, linguistic) information was stripped from the signal to normalize it in preparation for matching to a mental dictionary of abstract word types. In these abstract word types, representations were of no particular speaker, no particular rate of speaking, no particular affect, etc. More recent research findings suggest a different scenario. Memories for spoken words are particular in many or all of these ways. Moreover, research on "sinewave speech" suggests that the same time-varying acoustic structure can sometimes provide both phonetic and indexical information to listeners.

Research on audiovisual speech perception has shown a similar course. Early research focused on the observation that speech in noise is more intelligible if listeners can see the speaker's articulating face and on finding the phonetic information that listener/observers could extract from the face ("visemes"). However, of course, much more information is available on the face than phonetic information. As for information in the acoustic signal, in the optical signal there is indexical information, information about affect, speaking rate and more. Recent research using point light displays of the face (somewhat analogous to sinewave speech, but in the visual domain) suggest that the same optical structure that provides phonetic information can also provides indexical information. Most likely, it provides more information than that.

In short, in both the acoustic and audiovisual domains, research suggests that listener/viewers extract rich complexes of information from the face and acoustic signal, and they develop memories in which phonetic and nonphonetic, even nonlinguistic information is retained in rich event memories. There may be no purely phonetic mode of speech perception.

Research specifically exploring the information available to listeners and viewers during speech, and exploring the relation between facial movements and acoustic signals find considerable redundancy. Almost intelligible speech can be synthesized from information about activities of the facial muscles; or from information about facial movements, and simulation of facial movements can be achieved from acoustic information. One next step for research in audiovisual speech perception is to exploit the possibilities for behavioral research that these findings make available. I will attempt to suggest some useful next steps.

Last modified: Mon Nov 4 13:44:38 CET 2002

Face, Speech, and Acoustics

Abstract

Face, speech and acoustics: An overview