1 Preliminaries
- 1.1 General setup
- 1.2 An excerpt of the Kiel corpus
2 Vowel normalisation
3 Estimating vowel centralisation
4 Inter-Euclidean distances between vowel pairs
5 Mahalanobis distance
6 Orthogonal projection ratio

1 Preliminaries

1.1 General setup

Follow the setup instructions given here, i.e. download R and RStudio, create a directory on your computer where you will store files on this course, make a note of the directory path, create an R project that accesses this directory, and install all indicated packages.

For this and subsequent tutorials, access the tidyverse,magrittr, emuR, and wrassp libraries:

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(magrittr)

## 
## Attaching package: 'magrittr'
## 
## The following object is masked from 'package:purrr':
## 
##     set_names
## 
## The following object is masked from 'package:tidyr':
## 
##     extract

library(emuR)

## 
## Attaching package: 'emuR'
## 
## The following object is masked from 'package:base':
## 
##     norm

library(wrassp)
sourceDir = "./testsample"
targetDir = "./emu_databases"

1.2 An excerpt of the Kiel corpus

The following makes use of an extract of read speech from the Kiel corpus.

Either download the zip archive, unzip the archive and move kielread_emuDB to the folder emu_databases or execute the following code:

a = "https://www.phonetik.uni-muenchen.de/~jmh/"
b = "lehre/Rdf/kielread_emuDB.zip"
down_path = paste(a, b, sep="/")
if (!file.exists(file.path(targetDir, "kielread.zip"))) {
  download.file(down_path, 
                file.path(targetDir, "kielread.zip"))
}
unzip(file.path(targetDir, "kielread.zip"), exdir = targetDir)

A data frame used for Mahalanobis distance calculations:

b = "lehre/Rdf"
down_path = paste(a, b, sep="/")
urldir = paste(down_path, "formantidata.df.txt", sep="/")
mdata.df = read.table(urldir)

The following derives the formant frequencies for seven vowel types in this database using the emuR forest formant tracker. Since these commands were dealt with extensively in the previous module, they will be listed here with just a few comments.

# Load the database
kielread_DB = load_emuDB(file.path(targetDir, "kielread_emuDB"))
# identify the wav files
kiel_wav_paths = list.files(file.path(targetDir, "kielread_emuDB"), 
                            pattern = ".*wav$", 
                            recursive = T, 
                            full.names = T)
# identify the speaker
n = nchar(kiel_wav_paths)
speaker = substring(kiel_wav_paths, n-11, n-9)
# a logical vector identifying the male speaker
temp.m = speaker == "K67"
# run the formant tracker over that speaker's `.wav` files
forest(kiel_wav_paths[temp.m], gender = "m")
# the same for the female speaker but with `gender` set to `f`
temp.f = speaker == "K68"
forest(kiel_wav_paths[temp.f], gender = "f")
# add the formants to the database
add_ssffTrackDefinition(kielread_DB, 
                        "FORMANTS", "fm", "fms")
# query 7 vowel types
v.s = query(kielread_DB, "Phonetic = i:|e:|a:|o:|u:|I|E")
# get their formants
v.fm = get_trackdata(kielread_DB, v.s, "FORMANTS")
# time-normalise them to 11 data points
v.fmt = normalize_length(v.fm, N = 11)
# identify the speaker
v.fmt %<>% mutate(speaker = substring(v.fmt$bundle, 1, 3))

2 Vowel normalisation

The object is to carry out a standard normalisation following Lobanov (1971), i.e. by converting to \(z\) scores which means subtracting the mean and dividing by the standard deviation per speaker. Vowel normalisation should be calculated, as far as possible, on a balanced data set i.e. the \(z\) score transformation should be based on both front and back, high and low vowels in approximately equal proportion. Here the mean and standard deviation are based on all frames of data between the acoustic vowel onset and offset of the peripheral vowels [i:, e:, a:, o:, u:]. There are two speakers to consider and the \(z\) score normalisation will be carried out separately for each of them. The first step is to obtain the by-speaker mean and standard deviation as follows.

mean_sd <- v.fmt %>% 
  # Identify the vowels
  filter(labels %in% c("i:", "e:", "a:", "o:", "u:")) %>% 
  # For each speaker:
  group_by(speaker) %>% 
  # compute mean & sd of F1 and F2
  summarise(m1 = mean(T1),
            s1 = sd(T1),
            m2 = mean(T2),
            s2 = sd(T2)) %>%
  ungroup
mean_sd

## # A tibble: 2 × 5
##   speaker    m1    s1    m2    s2
##   <chr>   <dbl> <dbl> <dbl> <dbl>
## 1 K67      364.  147. 1525.  500.
## 2 K68      423.  219. 1846.  714.

The vowel normalisation can be carried out in two simple steps. The first involves making use of the dplyr function left_join() to bind these means and standard deviations into the original data frame v.fmt by speaker. That is, 4 new columns are created in v.fmt (those from mean_sd above) containing the mean and sd values appropriate to that speaker. The next step is to store the \(z\) score normalised formants as the variables F1 and F2.

# over-write v.fmt with `%<>%`
v.fmt %<>% 
  # 1. Add the mean & sd values from above
  left_join(mean_sd, by = "speaker") %>%
  # 2. Create two new columns, F1 and F2
  # which are computed from the 
  # raw formant values in columns T1 and T2 
  mutate(F1 = (T1 - m1) / s1,
         F2 = (T2 - m2) / s2)

A plot of the normalised vowels in the normalised F1 \(\times\) F2 space is now given by:

cols1 = c("black", "red", "blue", "orange") 
cols2 = c("magenta", "green", "slategray")
cols = c(cols1, cols2)
v.fmt %>% 
  # select values at temporal midpoint
  filter(times_norm == 0.5) %>%
  ggplot +
  aes(y = F1, x = F2, col = labels) +
  geom_point() +
  facet_wrap(~speaker) +
  scale_x_reverse(name = "F2 (Standard deviations)") +
  scale_y_reverse(name = "F1 (Standard deviations)") +
  scale_colour_manual(values = cols)

An alternative methodology proposed by Brand et al, (2021) can be used when the vowel types are imbalanced across the vowel space (e.g. many more front vowels than back vowels). In this case, a grand mean and standard deviation are calculated. The former is the mean of the vowel means calculated earlier; and the latter is the standard deviation of these means. These calculations can be derived in exactly the same way as before and then by calculating these grand mean and standard deviations:

mean_sd <- v.fmt %>% 
  # subset vowels
  filter(labels %in% c("i:", "e:", "a:", "o:", "u:")) %>% 
  # per speaker
  group_by(speaker, labels) %>% 
  # compute mean & sd of F1 and F2
  summarise(m1 = mean(T1),
            m2 = mean(T2)) %>%
  group_by(speaker) %>%
  # calculate grand mean and standard deviation
  summarise(gm1 = mean(m1),
            gs1 = sd(m1),
            gm2 = mean(m2),
            gs2 = sd(m2)) %>%
  ungroup

## `summarise()` has grouped output by 'speaker'. You can override using the
## `.groups` argument.

mean_sd

## # A tibble: 2 × 5
##   speaker   gm1   gs1   gm2   gs2
##   <chr>   <dbl> <dbl> <dbl> <dbl>
## 1 K67      370.  142. 1409.  527.
## 2 K68      425.  203. 1696.  724.

The revised \(z\) score normalisation is then:

v.fmt %<>% 
  # 1. Add the grand mean & sd values from above
  left_join(mean_sd, by = "speaker") %>%
  # 2. Create two new columns, F1 and F2
  # which are computed from the 
  # raw formant values in columns T1 and T2 
  mutate(F1g = (T1 - gm1) / gs1,
         F2g = (T2 - gm2) / gs2)

And the revised plot:

 v.fmt %>% 
  filter(times_norm == 0.5) %>%
  ggplot +
  aes(y = F1g, x = F2g, col = labels) +
  geom_point() +
  facet_wrap(~speaker) +
  scale_x_reverse(name = "F2 (Standard deviations)") +
  scale_y_reverse(name = "F1 (Standard deviations)") +
  scale_colour_manual(values = cols)

3 Estimating vowel centralisation

The centre of a speaker’s normalised vowel space is by definition the point [0, 0] in a \(z\) score normalised space. Therefore, some estimate of the degree of centralisation can be given by calculating the Euclidean disance to [0, 0]. This distance calculation is simply the square-root of the sum of normalised F1-squared and normalised F2-squared as follows:

v.fmt %<>%
  group_by(sl_rowIdx) %>%
  mutate(central = sqrt(F1^2 + F2^2)) %>%
  ungroup

It looks from the earlier plot that the male speaker’s [I] (speaker K67) is more central than that of the female speaker (K68). Whether this is so can also now be seen as follows:

v.fmt %>%
  filter(labels == "I") %>%
  filter(times_norm == .5) %>%
  ggplot +
    aes(y = central, x = speaker) +
    geom_boxplot()

but there’s no evidence to suggest this is so. On the other hand, a comparison of tense [i:] with lax [I] should show a greater degree of centralisation in [I] for both speakers.

v.fmt %>%
  filter(labels %in% c("I", "i:")) %>%
  filter(times_norm == .5) %>%
  ggplot +
  aes(y = central, x = labels) +
  facet_wrap(~speaker) +
  geom_boxplot()

4 Inter-Euclidean distances between vowel pairs

The task here is to quantify the degree of overlap between vowel pairs by measuring the inter-Euclidean distances between them. The distances are calculated to the centroid (or mean) of the other vowel. Thus, for a vowel category \(V\) with tokens \(v_1\), \(v_2\) … \(v_n\) and another vowel category \(U\) with tokens \(u_1\), \(u_2\) … \(u_n\), the distances are calculated from \(v_1\), \(v_2\) … \(v_n\) to the centroid of \(U\) and from \(u_1\), \(u_2\) … \(u_n\) to the centroid of \(V\). The vowel pairs in this example are [E, e:]. The further test is of whether, as the earlier plot in the F1 \(\times\) F2 plane suggests, the inter-Euclidean distance between them is less for male speaker K67 than for female speaker K68. The calculations will be done in the speaker-normalised F1 \(\times\) F2 space. Separate calculations will be made by speaker of:

The Euclidean distances of all [E] vowels to the centroid of [e:].
The Euclidean distances of all [e:] vowels to the centroid of [E].

There are three steps to this.

Calculate the [E, e:] centroids for each speaker at the temporal midpoint of the vowel.
Calculate the distance of all observations in the data frame v.fmt to each of these centroids.
Make a new data frame that only contains distances of [E] to the [e:]-centroid and of [e:] to the [E]-centroid.

4.1 Calculating [E, e:] centroids

The calculation by speaker in the normalised F1 \(\times\) F2 space of the [E, e:] centroids at the vowel midpoint can be done as follows:

centroids <- v.fmt %>% 
  # identify the temporal midpoint
  filter(times_norm == .5) %>%
  # choose the vowels
  filter(labels %in% c("E", "e:")) %>% 
  # for each speaker and vowel
  group_by(speaker, labels) %>% 
  # compute mean F1 and F2
  summarise(mF1 = mean(F1),
            mF2 = mean(F2)) %>%
  # rearrange in columns
  pivot_wider(names_from = labels, 
              values_from = c(mF1, mF2)) %>%
  ungroup

## `summarise()` has grouped output by 'speaker'. You can override using the
## `.groups` argument.

centroids

## # A tibble: 2 × 5
##   speaker mF1_E `mF1_e:` mF2_E `mF2_e:`
##   <chr>   <dbl>    <dbl> <dbl>    <dbl>
## 1 K67     1.07    -0.224 0.243    0.950
## 2 K68     0.680   -0.238 0.432    0.906

Note the column names involving [e:] are in single backward ticks in order not to interpret the colon as a special character. The next step is to join the centroids to the data frame v.fmt using the function left_join() as described earlier. This is, once again, done by speaker.

# overwrite v.fmt
v.fmt %<>% 
  # add the centroids
  left_join(centroids, by = "speaker")

The above adds the centroid data as new columns to v.fmt but by speaker. Thus, for example, since the 20th observation of v.fmt contains data for K67, the observation also has the centroids for that speaker:

v.fmt %>%
  slice(20) %>%
  select(speaker, starts_with("mF"))

## # A tibble: 1 × 5
##   speaker mF1_E `mF1_e:` mF2_E `mF2_e:`
##   <chr>   <dbl>    <dbl> <dbl>    <dbl>
## 1 K67      1.07   -0.224 0.243    0.950

4.2 Calculating the Euclidean distances

The first step is to write a simple function to calculate Euclidean distances between data points.

# calculate Euclidean distance between 
# the vectors `a` and `b`
euc = function(a, b) {
  sqrt(sum((a - b)^2))
}

The euc() function can be tested by calculating the distance from c(0, 0) to c(3, 4). This is the length of the longest side of a 3-4-5 right-angled triangle i.e. sqrt(3^2 + 4^2) and should give 5 in other words the length of the red line in:

plot(0:5, 0:5, type="n", xlab="x", ylab="y")
lines(c(0, 3), c(0, 4), col="red")
lines(c(0, 0), c(0, 4))
lines(c(0, 3), c(4, 4))

which is given by

a = c(0, 0)
b = c(3, 4)
euc(a, b)

## [1] 5

The task is now to apply the function euc() to each observation of v.fmt: more specifically, to calculate for each observation the Euclidean distance from normalised F1 and F2 to the [E]-centroid and to the [e:]-centroid, as follows:

# calculate distance to [E] centroid
v.fmt %<>% 
  # for each row i.e. for each observation
  rowwise() %>%
  # calculate the Euclidean distance to the [E]-centroid...
  mutate(dist_E = euc(c(F1, F2), c(mF1_E, mF2_E))) %>%
  # ... and to the [e:]-centroid
  mutate(dist_e = euc(c(F1, F2), c(`mF1_e:`, `mF2_e:`))) %>%
  # always ungroup after rowwise() or group_by()
  ungroup

Thus for e.g. the 20th observation, these distances are:

v.fmt %>%
  slice(20) %>%
  select(dist_E, dist_e)

## # A tibble: 1 × 2
##   dist_E dist_e
##    <dbl>  <dbl>
## 1   1.90  0.434

which should be the same as the Euclidean distances from here:

obs20 = v.fmt %>%
  slice(20) %>%
  select(F1, F2)
obs20

## # A tibble: 1 × 2
##       F1    F2
##    <dbl> <dbl>
## 1 -0.627  1.11

to here:

m20E = v.fmt %>%
  slice(20) %>%
  select(c(mF1_E, mF2_E))
m20E

## # A tibble: 1 × 2
##   mF1_E mF2_E
##   <dbl> <dbl>
## 1  1.07 0.243

and to here

m20e = v.fmt %>%
  slice(20) %>%
  select(c(`mF1_e:`, `mF2_e:`))
m20e

## # A tibble: 1 × 2
##   `mF1_e:` `mF2_e:`
##      <dbl>    <dbl>
## 1   -0.224    0.950

as verified by:

euc(obs20, m20E)

## [1] 1.902408

euc(obs20, m20e)

## [1] 0.4340553

4.3 Rearranging the data frame

The task here is to rearrange the data frame so that:

it includes only the distances from [E] to the [e:]-centroid and from [e:] to the [E]-centroid at the temporal midpoint.
the above is arranged in a single column of distances.
there is another column which is a factor with two levels in order to identify an observation as E_dist (the distance from [E] tokens to the [e:] centroid) or as e_dist (the distance from [e:] tokens to the [E] centroid).

The command below first deals with 3. to create a column named vowel with these two levels. All the distances in 2. are stored in a new column called distances, thus:

v.fmtlong = v.fmt %>%
  # for the columns that begin with dist_
  pivot_longer(cols = starts_with("dist_"), 
               # 3. make a new column called `vowel`
               # with levels `dist_E` or `dist_e`
               names_to = "vowel", 
               # 2. make a new column of the distances
               # of observations to the [e:]-centroid and of
               # to the [E]-centroid
               values_to = "distances")

This command below now deals with 1.:

e.df = v.fmtlong %>% 
  filter((labels == "E" & vowel == "dist_e") | 
         (labels == "e:" & vowel == "dist_E")) %>%
  filter(times_norm == 0.5)

A plot comparing the Euclidean distances between the speakers can now be made:

e.df %>%
  ggplot +
  aes(y = distances, x = speaker) +
  geom_boxplot() +
  facet_wrap(~ vowel)

The left panel has the distances of [E] vowels at the temporal midpoint to the [e:] centroid; the right panel from [e:] vowels at the temporal midpoint to the [E] centroid. It is evident that these distances are greater for K67 (less overlap between [E, e:]) than for K68.

5 Mahalanobis distance

The Mahalanobis distance is the number of ellipse standard deviations (or ellipsoid standard deviations for higher dimensions) of a point from the centre of the distribution. Consider for example, the position of the point \(X\) which is located roughly at the 95% confidence ellipse:

mdata.df %>%
  ggplot +
  aes(y = F1, x = F2) +
  stat_ellipse(type = "norm") +
  annotate("text", 
           y = -0.9649036, x = 0.5113503, 
           label = "X", size = 8, col = "red") +
  annotate("text", 
           y = -0.6073229, x = 0.9897430, 
           label = "O", size = 8, col = "red")

The Euclidean and Mahalanobis distances of \(X\) to the centroid \(O\) is given by:

# the point X
X = c(-0.9649036, x = 0.5113503)
# centroid (the point O)
mdata.cen = mdata.df %>% 
  summarise(F1 = mean(F1), F2 = mean(F2)) %>%
  as.matrix
# or
mdata.cen = apply(mdata.df, 2, mean)
# covariance matrix
mdata.cov = mdata.df %>% var

# Euclidean distance
euc(X, mdata.cen)

## [1] 0.5972634

# Mahalanobis distance
mahalanobis(X, mdata.cen, mdata.cov)

## [1] 5.994233

As the above shows, the Mahalanobis distance requires the calculation of the covariance matrix for the data used to construct the ellipse. For a two-dimensional space, the covariance matrix consists of the variance of the two parameters in the diagonal and the covariance (shared variance) between the two parameters in the off-diagonal.

#  variance of F1
mdata.df %>% pull(F1) %>% var

## [1] 0.06434925

# variance of F2
mdata.df %>% pull(F2) %>% var

## [1] 0.107755

# covariance between F1 and F2
with(mdata.df, cov(F1, F2))

## [1] -0.02617008

# covariance matrix
mdata.cov

##             F1          F2
## F1  0.06434925 -0.02617008
## F2 -0.02617008  0.10775497

The Mahalanobis distance of any point exactly on the 95% ellipse (which is itself a horizontal slice through a two-dimensional normal distribution) is given by the following for the two-dimensional case:

qchisq(.95, 2)

## [1] 5.991465

which is very close to mahalanobis(X, mdata.cen, mdata.cov) calculated earlier for the Mahalanobis distance between \(X\) and \(O\).

The earlier command for computing centroids is modified below to include covariance matrices. This requires storing lists in the summarise() function: this is because a covariance matrix is, as its name suggests, a matrix and not a vector:

centroids <- v.fmt %>% 
  # identify the temporal midpoint
  filter(times_norm == .5) %>%
  # choose the vowels
  filter(labels %in% c("E", "e:")) %>% 
  # for each speaker and vowel
  group_by(speaker, labels) %>% 
  # compute mean formants and covariance matrices
  # and save them as lists in one cell
  summarise(meanF = list(c(mean(F1), mean(F2))),
            covar = list(cov(cbind(F1, F2)))) %>% 
  ungroup() %>%
  # rearrange in columns
  pivot_wider(names_from = labels, 
              values_from = c(meanF, covar))

## `summarise()` has grouped output by 'speaker'. You can override using the
## `.groups` argument.

  ungroup

## function (x, ...) 
## {
##     UseMethod("ungroup")
## }
## <bytecode: 0x10e6e9bc0>
## <environment: namespace:dplyr>

centroids

## # A tibble: 2 × 5
##   speaker meanF_E   `meanF_e:` covar_E       `covar_e:`   
##   <chr>   <list>    <list>     <list>        <list>       
## 1 K67     <dbl [2]> <dbl [2]>  <dbl [2 × 2]> <dbl [2 × 2]>
## 2 K68     <dbl [2]> <dbl [2]>  <dbl [2 × 2]> <dbl [2 × 2]>

The next lines modify the earlier code to calculate the Mahalanobis distances:

  v.fmt %<>%
  # add columns from centroids to v.fmt
  left_join(centroids, by = "speaker") %>%
  # for each observation
  rowwise() %>%
  # compute mahal distances to E and e:
  # values from cells than contain lists 
  # must be unlisted!
  mutate(
    distmah_E = mahalanobis(
      c(F1, F2), 
      meanF_E %>% unlist(),
      covar_E %>% unlist()
    ),
    distmah_e = mahalanobis(
      c(F1, F2), 
      `meanF_e:` %>% unlist(),
      `covar_e:` %>% unlist()
    )) %>%
  ungroup()

The following creates a data-frame with columns dtype specifying whether a particular observation if of a Euclidean or Mahalanobis distance; vowel to code each observation as a distance to the /E/ or to the /e/ centroid; and distance with the Euclidean or Mahalanobis distance value:

v.fmtlong2 = 
  v.fmt %>% 
  pivot_longer(cols = c(dist_e, dist_E, distmah_e, distmah_E),
               names_to = c("dtype", "vowel"),
               values_to = "distance",
               names_sep = "_") %>%
  mutate(dtype = ifelse(dtype == "dist", "Euclidean", "Mahalanobis"),
         vowel = ifelse(vowel == "e", "dist_to_e", "dist_to_E"))

A data-frame is now derived from v.fmtlong2 above to include only the distances from [E] to the [e:]-centroid and from [e:] to the [E]-centroid at the temporal midpoint:

e.df2 = v.fmtlong2 %>% 
  filter((labels == "E" & vowel == "dist_to_e") | 
         (labels == "e:" & vowel == "dist_to_E")) %>%
  filter(times_norm == 0.5)

The following shows the Mahalanobis distance of /E/ to the /e:/-centroid and vice-versa separately by speaker. Because the Mahalanobis distances extend over a lare range (from 0 to 250), the plot shows the log. Mahalanobis distances which compresses the vertical scale:

e.df2 %>%
  filter(dtype == "Mahalanobis") %>%
  ggplot +
  aes(y = log(distance), x = speaker) +
  geom_boxplot() +
  facet_wrap( ~ vowel) + 
  ylab("Log. Mahalanobis distance")

6 Orthogonal projection ratio

This is another useful distance metric used by Stevens et al, (2019) in order to compare the relative position of a catgory \(V\) to two other anchor categories \(U\) and \(W\). Typically, \(V\) is positioned somewhere between \(U\) and \(W\) in an acoustic space. The task is to quantify the relative proximity of \(V\) between the anchor categories. Points that are positioned exactly on the (centroid of) the anchors have values of ±1. Points between the anchors vary between ±1. The function itself is:

orthogonal_projection = function(V, U, W) {
  # V, U, W are vectors of the same length
  # which must be greater than 1
  # U, W are the anchors. Calculate the
  # orthogonal projection ratio of V
  # to the two anchors
  ( -2 * ((V-U) %*% (U-W)) / ((U-W) %*% (U-W)) ) - 1
}

where %*% denotes matrix multiplication. The orthogonal projection ratio is ±1 at the anchors. This can be shown as follows:

anchor1 = c(6, 10)
anchor2 = c(-4, 12)
orthogonal_projection(anchor1, anchor1, anchor2)

##      [,1]
## [1,]   -1

orthogonal_projection(anchor2, anchor1, anchor2)

##      [,1]
## [1,]    1

whereas a point exactly half-way between the anchors has a value of zero:

midway_point = c(1, 11)
orthogonal_projection(midway_point, anchor1, anchor2)

##      [,1]
## [1,]    0

Here the task is to calculate the orthogonal projection for [u:] relative to the anchors [i:] and [o:] in the F1 \(\times\) F2 speaker-normalised space at the vowels’ temporal midpoint: this can be used as an index for the degree of [u:]-fronting. The calculation should show that [u:] is relatively much closer to [o:]; there is also some suggestion from the earlier plots in the F1 \(\times\) F2 plane that [u:] may be slightly fronter i.e. closer to [i:] for the male than for the female speaker. The first step is to calculate the anchors i.e. the by-speaker centroids of the two anchors [i:, u:]. The same code that was used earlier can be modified for this purpose:

centroids_high <- v.fmt %>% 
  # identify the temporal midpoint
  filter(times_norm == .5) %>%
  # choose the anchor vowels
  filter(labels %in% c("i:", "o:")) %>% 
  # for each speaker and vowel
  group_by(speaker, labels) %>% 
  # compute mean F1 and F2
  summarise(highcenF1 = mean(F1),
            highcenF2 = mean(F2)) %>%
  # rearrange in columns
  pivot_wider(names_from = labels, 
              values_from = c(highcenF1, highcenF2)) %>%
  ungroup

## `summarise()` has grouped output by 'speaker'. You can override using the
## `.groups` argument.

centroids_high

## # A tibble: 2 × 5
##   speaker `highcenF1_i:` `highcenF1_o:` `highcenF2_i:` `highcenF2_o:`
##   <chr>            <dbl>          <dbl>          <dbl>          <dbl>
## 1 K67             -0.607        -0.0121          0.990          -1.51
## 2 K68             -0.641        -0.174           1.06           -1.42

The next step, as before, is to left join the centroids to the original data frame:

# overwrite v.fmt
v.fmt %<>% 
  # add the centroids
  left_join(centroids_high, by = "speaker")

The calculation of the orthogonal projection ratio of each observation in v.fmt to these anchors is now straightforward and uses the same kind of code structure as for calculating Euclidean distances:

v.fmt %<>% 
  # for each row i.e. for each observation
  rowwise() %>%
  # calculate orth.proj.ratio...
  mutate(orth_high = 
           orthogonal_projection(
             c(F1, F2), 
             c(`highcenF1_i:`, `highcenF2_i:`), 
             c(`highcenF1_o:`, `highcenF2_o:`)
           )
         ) %>%
  ungroup

Here are boxplots of the distribution of the orthogonal projection ratio for [u:] vowels at the temporal midpoint:

v.fmt %>%
  filter(labels == "u:") %>%
  filter(times_norm == 0.5) %>%
  ggplot +
  aes(y = orth_high, x = speaker) +
  geom_boxplot()

The boxplots show:

These orthogonal projection ratios are positive which means that [u:] is much closer to [o:] than to [i:], as expected.
There is some marginal evidence that these orthogonal projection ratios are lower for the male speaker K67 than for the female speaker K68 – which suggests the possibility that K67’s [u:] are slightly fronted. (It is possible that this would emerge even more clearly based on analyses of F2 alone, rather than F1 and F2 combined).

Vowel normalisation and vowel distances

Jonathan Harrington & Johanna Cronenberg

2023