Functional data analysis for phonetic research
Michele Gubian
Functional Data Analysis (FDA) is a set of techniques that allow one to perform statistical analysis on sets of curves (or contours). Analysing f0 or formant contours is a common task in phonetic research where typically a few numerical features are first extracted from the curves (e.g. durations, values in the middle of segmental intervals, slopes, etc.) and then ordinary statistics is applied on those features. This procedure forces one to decide in advance what to retain and what to discard from the complex information encoded in the curves (e.g. is degree of curvature important?). FDA allows one to let the contour data set determine its own parametrisation. This not only eliminates the intermediate step of (manual) feature extraction but also alleviates the risk of destroying valuable information encoded in the contour shapes.
In my talk I will introduce one FDA technique, namely Functional Principal Component Analysis (FPCA), which is versatile enough for both exploratory analysis, e.g. when it is not yet clear how many categories/factors will be considered, as well as for more controlled scenarios. Using case studies I will illustrate all the steps involved in analysing a set of f0 or formant contours starting from the raw data coming from Praat. An important element of novelty is the way I propose to deal with time normalisation. I will show how to integrate contour shapes and segmental durations (e.g. syllable boundaries) in FPCA by representing the latter as continuous time distortion curves. This allows one to carry out a joint shape-duration analysis all within FPCA, as opposed to carrying out two separate analyses, i.e. one on time-normalised curves, another on durations. Time permitting, I will also show how to 'revert' FPCA and make it become a re-synthesis tool, a technique of interest for those who manipulate f0 for perceptual experiments.