Face, Speech, and Acoustics
		Introduction · Program · Local Information · Registration · Contact

Abstract

Gérard Bailly (Institut de la Communication Parlée, Grenoble)

Audiovisual speech synthesis: Speaker-specific control, shape, and appearance models

The virtual speakers developed at ICP are clones of actual human speakers: they mimic both the movements of speech articulators, the geometric deformation of the speech organs and the final appearance of the original speaker's face. Experimental data recorded and processed to fulfil this challenge will be presented together with our modelling principles and results. We will notably introduce our first steps towards a generic talking face using our virtual clones. This generic talking face will be augmented with internal generic speech organs such as a jaw, tongue and velum. We will also discuss assessment issues and present recent evaluation results using the point light paradigm. We will finally discuss the relationship between facial movements, speech and acoustics in light of our modelling work.

References

Badin, P., P. Borel, et al. (2000). Towards an audiovisual virtual talking head: 3D articulatory modeling of tongue, lips and face based on MRI and video images. Proceedings of the 5th Speech Production Seminar, Kloster Seeon - Germany.

Bailly, G., L. Revéret, et al. (2000). Hearing by eyes thanks to the "labiophone'': exchanging speech movements. COST254 workshop: Friendly Exchanging Through The Net, Bordeaux - France.

Revéret, L., G. Bailly, et al. (2000). MOTHER: a new generation of talking heads providing a flexible articulatory control for video-realistic speech animation. International Conference on Speech and Language Processing, Beijing - China.

Bailly, G. (2001). Audiovisual speech synthesis. ETRW on Speech Synthesis, Perthshire - Scotland.

Elisei, F., M. Odisio, et al. (2001). Creating and controlling video-realistic talking heads. Auditory-Visual Speech Processing Workshop, Scheelsminde, Denmark.

Badin, P., G. Bailly, et al. (2002). "Three-dimentional linear articulatory modeling of tongue, lips and face based on MRI and video images." Journal of Phonetics 30(3): 533-553.

Bailly, G. (2002). Audiovisual speech synthesis. From ground truth to models. International Conference on Speech and Language Processing, Boulder - Colorado.

Bailly, G. and P. Badin (2002). Seeing tongue movements from outside. International Conference on Speech and Language Processing, Boulder - Colorado.

Bailly, G., E. Vatikiotis-Bateson, et al. (In preparation). Visible articulatory degrees of freedom of speech movements. Audiovisual speech processing. E. Vatikiotis-Bateson, G. Bailly and P. Perrier. Cambridge, MA, USA, MIT Press.

Bailly, G. (accepted). "Audiovisual speech synthesis." International Journal of Speech Technology.

Last modified: Thu Oct 31 15:16:31 CET 2002

Face, Speech, and Acoustics

Abstract

Audiovisual speech synthesis: Speaker-specific control, shape, and appearance models