EMMA and perceptual analyses of coarticulation

in fricative-vowel syllables produced by children and adults

William F. Katz



Loorenstrasse 39, Zurich, CH-8053

wkatz@active.ch



To test competing hypotheses concerning young children's syllable-level speech motor organization, anticipatory coarticulation was analyzed in the fricative-vowel productions of young children (ages 5 and 7) and adults. Kinematic data were obtained for talkers' [si su shi shu] productions elicited in a carrier phrase during a naming task. Tongue-tip and tongue-body movement were recorded using electromagnetic midsagital articulography (EMMA). The results indicated fricative-specific patterns:

For [sV] productions the data suggested children's lingual positioning shows more extensive anticipatory coarticulation than that of adults; whereas for [SV] productions children's coaticulation patterns were similar to (or less extensive than) those of adults. In a perceptual study, gated portions of fricatives produced by a subset of the talkers were played to adult listeners for whole-syllable identification judgments. The perceptual results yielded relatively low identification values compared with previous studies, perhaps as a consequence of speech being recorded under articulograph conditions. Nevertheless, significantly better whole-syllable and fricative identification was found for adult productions compared to those of children. For vowel identification, no talker-group differences emerged, suggesting children's lingual coarticulation ranges over a similar temporal extent as that of adult speech. Taken together, the results indicate general hypotheses concerning developmental trends in anticipatory coarticulation must take into account additional factors, such as the exact articulator(s) involved and the role of gestural complexity.

==========================================================





Acoustic analysis of children's speech movement disorders



Hedwig Amorosa, Stefan Sitter



Children with expressive language disorder also have problems in speech motor control. Failure to automatize speech movements could contribute to these problems. To assess automatization, it is possible to have subjects execute repetitive movements, measure motion parameters and calculate their variability; measuring articulatory movements seems to be most appropriate.

However, children in general are quite sensitive to manipulations in the mouth area. The children with language problems are in addition quite aware of their deficits and fail to cooperate. Therefore, these procedures can be expected to lead to results with doubtful validity. The use of purely acoustic measurement does not have this shortcoming.

In comparison to normal children, disordered children often have higher variability in simple parameters (time per syllable; VOT; RMS). We will present some previous work (Amorosa 1989; Winner 1990) about these comparisons. Additionally, we could present newer data from an ongoing project.

The simple measures from acoustic data are only indirectly related to speech movements. Currently, we want to explore whether and how that gap can be bridged. Therefore we would be interested to hear about the present state of phonetic models, e.g., analysis of formants in (often pathologic) children's voices, acoustic analyses in noise, which is often caused by the children themselves - perhaps robust techniques from speech input computer systems could be adapted.

References:



Winner, Anna (1990) Akustische Unterschiede im frikativen Sprachsignal von sprachlich unauffaelligen und sprachentwicklungsgestoerten Kindern als Indikatoren fuer Sprechbewegungsstoerungen. Dissertation, LMU Muenchen.

Amorosa, Hedwig (1989) Die Untersuchung kindlicher Sprechbewegungsstoerungen mit Hilfe der akustischen Analyse. Habilitation, LMU Muenchen.

=============================================================



The extent of coarticulation of English liquids: an acoustic and articulatory study

Paula West



Phonetics Laboratory

41 Wellington Square

University of Oxford

Oxford OX1 2JF

E-mail: paula.west@phon.ox.ac.uk



English /l/ and /r/ have secondary articulations, whose coarticulatory effects on neighbouring segments have been claimed to have temporal extent longer than the phonological foot. In prior work I have shown that these patterns of long-domain coarticulation are perceptually available to listeners and are largely manifested in F2 and F3 differences. Whilst claims about the articulatory settings of long-domain patterns have been made, little articulatory study has been undertaken. This study explores the production of long-domain coarticulatory patterns associated with English /l/ and /r/ by investigating the extent and nature of differences in articulation. An EMA recording of a speaker of Southern British English was made. Six minimal word pairs (leap/reap, lip/rip, lap/wrap, lob/rob, lope/rope, lobe/robe) were recorded in the frame sentence ‚Have you uttered a X at home?'. Three EMA coils were placed on the tongue, two on the lips (upper and lower lip) and one on the gum below the lower incisor.

Reference coils were placed on the upper incisor and bridge of the nose.

Seven repeats of each utterance were obtained and analysed. The first three formant frequencies and (x,y) co-ordinates of each coil were measured at the midpoint of four vowels: the two vowels adjacent to the liquid, the vowel in the final syllable of ‚uttered' and the vowel in ‚at' (capitalised vowels in ‚Have you uttEREd A liqVcons At home?'). General linear models were then constructed for each vowel and, together with t-tests, were used to check the acoustic and articulatory data for statistically significant differences associated with the difference in liquid. Strong local coarticulatory effects were found in the vowels adjacent to the liquid. Small non-local anticipatory differences were found in F3, upper lip protrusion and tongue height. Non-local perseverative differences proved more elusive, although there was a significant difference in placement of the tongue tip for the /l/ vs. /r/ contexts which seems not to have any acoustic consequences correlated with the l/r distinction. Correlations between the acoustic and articulatory data at all measurement points were explored. Analysis of the EMA movement data shows significant differences in direction of movement of the upper lip and tongue in the vowel two syllables before the liquid. These differences correspond to acoustic differences found in l/r sentences. This experiment confirms that phonological distinctions are not just made locally, and suggests that both the tongue and the lips may play a role in long-distance coarticulation.

=========================================================



Self-organization model of speech on categories of

voice and unvoice suprasegments

Oleg P. Skljarov



Research Institute of Ear, Throat, Nose and Speech

198013, Bronnitskaja st., 9, St.-Petersburg, Russia

E-mail: skljarov@usa.net



Prosodic models, in particular, different duration models play the important role in theoretical and applied aspects of Spoken Language Processing. Such models are used, for example, in Language Identification and Speech Disorders Diagnosis. Here we shall present nonlinear duration model. This model is developed by us for an explanation of results received during comparative research of large-scale temporary organisation of speech signal in norm and at stutter. The concrete research question of this study is some evidences about that the alternating durations of alternating voice/unvoice gestures constellations (i.e., constellations which compel and which not compel vocal folds to vibrate) submit to recurrent logistic law. The contents of research are presented as follows items.

1.Some segmentation procedure described below is applied to a speech acoustic wave. Serial number n and duration Tn are attached to each segment. The received sequence {Tn} is not an output of the random-number generator, but is formed according to some recurrent (specifically, logistic) law.

2. Segmentation procedure. The input signals were normalized on all dynamic range of AD-DA converter. At first we determined threshold of segmentation for each individual patient with the help of handling of test phrase "papa, papa, papa". The amplitude threshold of a gradually grows from 0 with a rather small step so long as for the first time will arise 6 (and only 6) voice segments. We used a time parameter about several periods of basic tone also. If the following sample did not occur at t > this parameter, we considered the voice segment as completed and unvoice segment begins. Then this threshold was used for segmentation of a main signal. The comparison of results of our segmentation method with both results of hand-operated segmentation, and with results of segmentation by HMM-ANN method (as both for an acoustic signal, and for EGG signal) has given satisfactory coincidence (Parlangeau & Andre-Obrecht).

3. Signals, Subjects and Statistical Analysis. We used an acoustic signals arising at reading by patients of the standard text, consisting from 120 syllables. On latest stages of a research we used the spontaneous speech of patients of the same size also. The patients were both men, and women in the age of from 17 till 50 years. Number of healthy patients was some tens; of stutterers (various degree of severity) - some hundreds. For each patient the average duration of segments T (on a joined set of voice and unvoice segments) and their standard deviation std (ratio of root-mean-square deviation to average) was calculated. If the point (T, std) corresponds to each patient, on a plane of indicated parameters there is characteristic "fork" diagram. On this diagram the patients with normal speech are grouped in top of the diagram, and stutterers have place on branches of the diagram, and the further from top, than more considerable severity of stuttering. The upper branch corresponds to patients with slow speech, lower - with accelerated speech (clustering of stutterers is getting up with help of Mann-Whitney test, p < 0.01). One can trace the trajectories for typical evolution of patients during course of stuttering correction from branches to the top of diagram. Evolution trajectory of spontaneous speech pass (as a rule) in parallel to ones for "reading" speech.

4. Theoretical interpretation. A sequence {Tn} is sequence of order parameters for a sequence of dynamic tasks. Their decisions give speech signal. Such interpretation is allowed by the theory of self-organization of complex systems in case of bifurcation dependences for Tn (Prigogine). In this case dynamic problem for N (N>>1) variables is shared on two dynamic problems: one problem is one-dimension problem for Tn (logistic equation), second problem is task for staying variables. This task uses order parameter Tn obtained at resolving of the first problem. Consecutive decision of such tasks should present a speech signal (in terms of Browman and Goldstein - task dynamics sequence for gestures sequence give speech signal).

5. Articulatory interpretation. As we have not developed plant for the decision of such tasks sequences (such development is not the purpose of our research), it is possible to replace true task dynamics by model dynamics (naturally, this model does not give real speech). We shall consider two reciprocally working gestures constellations and gesture (associated with pause) working reciprocally with union of two constellations described above. For example, these constellations produce phonologic sequence as " papa, papa, papa, ... ". Gestures constellation (lip closure + voice folds abduction) appropriate to unvoice plosive "p" is designated as symbol "p''; constellation (jaw opening with simultaneous lips opening + voice folds adduction) appropriate to vowel "a" is designated as symbol "a; some unknown constellation appropriate to phonologic "pause" is designated as ", space ".Then the phonologic sequence " papa, papa, papa, ... " will be written down as symbolical record of gestures as " papa, papa, papa, ... " also. We offer such interpretation as articulatory-gestural illustration. The subject of our work consists in the statement: durations of these constellations submit to logistic recurrent dependence. Moreover in real speech, the alternating durations of alternating gestures constellations (constellations making or not making vibration of vocal folds) submit to such law also.

6. Experimental evidences. Predicted by this model (see paper) stutterers' clustering shown above. Square-mean-approximation surface of measured data in 3D space of durations (duration of n-th, (n + 1)-th, and (n + 2)-th segments) in section by coordinate planes gives characteristic square-law (logistic) dependence. Proportional timing as a ratio of total voice duration to total unvoice duration is equal to 3/2 for stutterers which come to norm across both different temps and different subjects (only for "reading" speech). This finding appropriate both to literature data (Max & Caruso, JSLHR, 40, pp.1097-1110) and to theory of determination chaos for discrete logistic transformation (Schuster, Deterministic Chaos, Weinheim, 1984).

=============================================================



A new Electromagnetic Articulography instrument for registration of lip and tongue movements

Hansjörg Horn (1), Thomas Scholl (1), Hermann Ackermann (2),

Ingo Hertrich (2), Rüdiger Berndt (2), Gernot Göz (1)

Departments of Orthodontics (1) and Neurology (2)

University of Tübingen, Germany

E-mail: Hansjoerg.Horn@med.uni-tuebingen.de



Assessment of orofacial movements during speaking and swallowing is of relevance in phonetics, speech pathology, orthodontics as well as neurology. So far, electromagnetic articulography (EMA) is the most effective method to register lip and tongue movements. However, the measuring system presently on the market (AG100, Carstens, Lenglern) has some limitations with respect to measurement accuracy, fitting of the transmitter helmet, and quality of the recorded acoustic signal. Therefore, a completely new instrument was developed that can be used on adults as well as children.

As the commercial device, the new system measures distances via electromagnetic induction by evaluating signals transmitted by sending coils mounted on a helmet and received by small receiver coils attached to the articulators of interest. The major changes of the new system as compared to the AG100 are: (1) In addition to the three transmitters, a fourth coil system producing an homogeneous field is used in order to provide the possibility to directly register the deviation angle between the orientation of the transmitters and the receiver coils. (2) To avoid undesired interactions between the four magnetic fields, they are activated in a multiplex mode. (3) In order to increase quantization accuracy, the received signals are linearized by an analogue device before they are digitized. (4) The data can be saved to disk continuously, providing the possibility of longer continuous recordings. (4) two light-wight helmets (one for children and one for adults) were constructed using carbon fibre rods. (5) Two-channel acoustic signals can be recorded in 16 bit quality with a sampling frequency up to 44 kHz.

In a laboratory test, measurement accuracy (about .2 mm) was satisfactory within an 80 mm diameter circle in the mid-sagittal plane round the helmet center. The measured torsion angle has an error of about 3 degrees. Lateral movements up to about 10 mm outside the mid-sagittal plane can, within some limits, be detected and tolerated for a required overall measurement accuracy of about .5 mm.

===========================================================



Motor equivalence: Methodological and theoretical considerations

Ingo Hertrich, Hermann Ackermann



Neurology Department, University of Tübingen

Hoppe-Seyler-Str. 3, D-72076 Tuebingen

E-mail: ingo.hertrich@uni-tuebingen.de







It is still unclear at what level of encoding inter-articulator coordination is programmed. Some previous studies reported variable lip and jaw gestures, whereas the resulting trajectory was relatively invariant. The underlying principle has been referred to as motor equivalence, indicating that a relevant target configuration may be achieved by various combinations of subsystem movements. The results of studies on motor equivalence, however, do not give a consistent picture so far. Although most authors agree that motor equivalence represents an important principle underlying speech production, it seems difficult to obtain direct experimental evidence of this phenomenon. The current study used a syllable repetition task under controlled speech rate conditions to further elucidate the complementary aspects in lip-jaw and tongue-jaw coordination. The results, first, show large speaker variability with respect to the relative contribution of the jaw to the component gestures. Second, across repetitions, no consistent negative correlations between jaw and lip or jaw and tongue were observed, although in some cases highly significant negative correlations were obtained. Considering the data across speech rate conditions and different vowels, it becomes obvious that, to some extent, the jaw shows some independency from the lips and the tongue. However, the differential role of jaw contribution to the compound trajectories varies largely across speakers.

Besides discussion of motor equivalence, some methodological aspects will be addressed with respect to the coordinate system of the measurements.

=============================================================















MUNICH: WORK IN PROGRESS



Institut für Phonetik und Sprachliche Kommunikation

Ludwig-Maximilians-Universität München

Schellingstr. 3

D-80799 Munich

Germany



email: hoole|kroos|anjag|andi@phonetik.uni-muenchen.de







Tongue-jaw trade-offs and naturally occurring perturbation



Philip Hoole, Christian Kroos & Anja Geumann



Experimentally induced perturbation of speech shows that articulatory resources are flexibly marshalled to achieve phonetically-defined goals. Are such key principles of motor control observable in unconstrained speech? Two sources of natural perturbation were considered: coarticulation, and loud speech. Alveolar consonants were investigated. The expectation was that V-to-C coarticulation and different loudnesses would cause consonantal jaw position to vary. Would the tongue use compensation to keep vocal tract constriction relatively constant? Would trade-off patterns vary over consonants sharing place of articulation but differing in manner? One preliminary hypothesis, motivated by pilot experiments, was that trade-offs would be most apparent in acoustically sensitive sounds such as sibilants. This was not confirmed. Jaw position was so precise for sibilants that no lingual compensation was required. Probably the teeth function here as an active articulator. Nevertheless, clear patterns in overall magnitude of variability in the alveolar consonants were observed. For both jaw and tongue, variability increased from fricatives via stops to the lateral (and nasal). Within the tongue these sound-specific effects were more pronounced for tongue-back than tongue-tip. Even such simple facts await a coherent explanation. To this end, discussion will focus on the acoustic properties of these sounds.[Supported by DFG Ti69/31]







Analysis of mandibular activity with EMA



Christian Kroos



Investigation of tongue-jaw trade-offs involves decomposing measured tongue movement into its intrinsic, jaw-independent component. This is a delicate task for two reasons. The first is related to the couple nature of the articulators. The interesting cases for tongue-jaw trade-offs are those where there is a negative correlation between 'jaw' and 'intrinsic tongue' position. However, this is a kind of part-whole correlation where spuriously high correlations can easily occur, especially if the jaw data is noisy. Secondly, accurate decomposition of measured tongue position into the intrinsic tongue component may depend in turn on appropriate decomposition of the jaw movement into rotational and translational components. We will discuss how use of multiple jaw sensors (and supporting anatomical information from NMRI) can help tackle these problems.







Development of a 3-D articulograph



Andreas Zierdt





We report briefly on current progress in a German Research Council project whose aim is to develop a three-dimensional electromagnetic articulography system. In fact, a better designation might be 5-D since the geometric algorithm uses information from 6 transmitter coils to determine 3 spatial coordinates plus 2 angles of orientation for the sensor coils.









Synchronized articulographic and video recording



Phil Hoole, Christian Kroos



Short demo: A experimental setup for acquiring video data, e.g of lips, together with synchronized articulographic data (and additional analog channels, if desired) will be shown.



===================================================================

Ongoing Research at Edinburgh



Alan Wrench



Queen Margaret College

Edinburgh



MOCHA



Speech recognition using MultiCHannel Articulatory data Following on from the work by Schmidbauer et al and Papcun et al we have just started a 3 year project to investigate the potential of directly measured articulatory features as input to an automatic speech recognition system. Issues such as measuring velic movement with EMA and transformation of the physical space in which EMA measures are two issues currently being investigated.



Optopalatography

A new palate-based instrument designed to measure tongue/palate distances is being developed in Edinburgh. As part of this development, we intend to use the EMA as a reference to establish the accuracy of the device.



Alveolar to velar coarticulation in fast and careful speech

This PhD research project is concerned with the articulatory details of a connected speech process - the so called assimilation of a word-find alveolar nasal to a following velar plosive, under conditions of varied speech rate.

http://sls.qmced.ac.uk/conf/97ellis2.htm





Scottish Vowel Length Rule

http://sls.qmced.ac.uk/RESEARCH/PROJECTS/svlr.htm





Study of Coarticulation in /kl/ clusters

Continuing investigating the differences in tongue trajectory for /kl/cluster and singleton /k/



===================================================================

The use of EMA in second language production

Barbara Kühnert

bjk1001@cus.cam.ac.uk



While phenomena of second language (L2) acquisition and use have come to occupy an important place in the development of our understanding of the human capacity for language, few studies have been pursued in terms of L2 phonetics and phonology. This relative neglect is somewhat surprising, as the small body of work that exists shows that what is learned and how it is learned can shed light on very basic issues of speech perception and production. Thus, there is clearly need for fine-grained phonetic research.

This presentation gives a brief outline of a planned research project on the production of German by native speakers of Southern British English. In particular, the production strategies associated with the articulation of German vowels should be investigated using electromagnetic articulography (EMA) and acoustic analyses. The richness of the German vowel system poses a challenge to English speakers, since on the one hand they have to learn how to produce new sounds that do not occur in the repertoire of their native language, i.e. they have to coordinate their speech organs in a manner they have not done previously (as for /y/ as in 'Tür'). On the other hand, they also have to learn how to modify some of their routine articulatory movements in order to produce similar but non- identical sounds since German and English differ according to how sounds which are nominally considered to be the 'same' are realised on the phonetic surface (e.g. the German /u/ is said to be produced with a tongue constriction formed further back in the vocal tract and with less variability than English /u/). The movement data collected with EMA should help to gain some emipircal insight into precisely what English speakers do when speaking German and what it is that often makes them sound foreign.

The design of the project, the theoretical issues which can be addressed with the help of EMA data and some yet unsolved problems in the experimental set-up and analysis will be discussed.

===================================================================

Compensation strategies for the perturbation of the rounded vowel [u] : an acoustic, articulatory and perceptual study



Christophe SAVARIAUX



Institut de la Communication Parlée

Université Stendhal

BP 25

38040 GRENOBLE Cédex

Tel : (33) 4 76 82 68 69

Fax : (33) 4 76 82 43 35

e-mail :



We proposed an experiment the aim of which was to test the respective roles of the articulatory, acoustic and perceptual levels in the control of vowels production. This experiment involved a lip perturbation impeding the usual articulatory strategy for the production of the French rounded vowel [u]. To study how speakers are able to achieve their speech goal, in spite of the perturbation, a 25 mm diameter lip-tube was inserted between the lips of 11 speakers while they produced the French [u].

Acoustic and X-ray articulatory data were recorded for all speakers for both normal and lip-tube conditions. Articulatory measurement in lip-tube condition were made first immediately after the insertion of the tube and second after an "adaptation" procedure of 20 trials. A first analysis based on F1/F2 comparison show that only one speaker was able to acoustically compensate for the perturbation. To refine the analysis of these data, we proposed a perceptual analysis based on two different perceptual tests : an identification and a rating test. Results have been analysed through the 3 levels (articulatory, acoustic and perceptual) and a perceptual characterization of the French [u] in a F1/F2-F0 plane was given.



===================================================================