Follow the setup instructions given here, i.e. download R and RStudio, create a directory on your computer where you will store files on this course, make a note of the directory path, create an R project that accesses this directory, and install all indicated packages.
For this and subsequent tutorials, access the tidyverse
,magrittr
, emuR
, and wrassp
libraries:
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'magrittr'
##
## The following object is masked from 'package:purrr':
##
## set_names
##
## The following object is masked from 'package:tidyr':
##
## extract
##
## Attaching package: 'emuR'
##
## The following object is masked from 'package:base':
##
## norm
The following makes use of an extract of read speech from the Kiel corpus.
Either download the zip archive, unzip the archive and move kielread_emuDB
to the folder emu_databases
or execute the following code:
a = "https://www.phonetik.uni-muenchen.de/~jmh/"
b = "lehre/Rdf/kielread_emuDB.zip"
down_path = paste(a, b, sep="/")
if (!file.exists(file.path(targetDir, "kielread.zip"))) {
download.file(down_path,
file.path(targetDir, "kielread.zip"))
}
unzip(file.path(targetDir, "kielread.zip"), exdir = targetDir)
A data frame used for Mahalanobis distance calculations:
b = "lehre/Rdf"
down_path = paste(a, b, sep="/")
urldir = paste(down_path, "formantidata.df.txt", sep="/")
mdata.df = read.table(urldir)
The following derives the formant frequencies for seven vowel types in this database using the emuR forest
formant tracker. Since these commands were dealt with extensively in the previous module, they will be listed here with just a few comments.
# Load the database
kielread_DB = load_emuDB(file.path(targetDir, "kielread_emuDB"))
# identify the wav files
kiel_wav_paths = list.files(file.path(targetDir, "kielread_emuDB"),
pattern = ".*wav$",
recursive = T,
full.names = T)
# identify the speaker
n = nchar(kiel_wav_paths)
speaker = substring(kiel_wav_paths, n-11, n-9)
# a logical vector identifying the male speaker
temp.m = speaker == "K67"
# run the formant tracker over that speaker's `.wav` files
forest(kiel_wav_paths[temp.m], gender = "m")
# the same for the female speaker but with `gender` set to `f`
temp.f = speaker == "K68"
forest(kiel_wav_paths[temp.f], gender = "f")
# add the formants to the database
add_ssffTrackDefinition(kielread_DB,
"FORMANTS", "fm", "fms")
# query 7 vowel types
v.s = query(kielread_DB, "Phonetic = i:|e:|a:|o:|u:|I|E")
# get their formants
v.fm = get_trackdata(kielread_DB, v.s, "FORMANTS")
# time-normalise them to 11 data points
v.fmt = normalize_length(v.fm, N = 11)
# identify the speaker
v.fmt %<>% mutate(speaker = substring(v.fmt$bundle, 1, 3))
The object is to carry out a standard normalisation following Lobanov (1971), i.e. by converting to \(z\) scores which means subtracting the mean and dividing by the standard deviation per speaker. Vowel normalisation should be calculated, as far as possible, on a balanced data set i.e. the \(z\) score transformation should be based on both front and back, high and low vowels in approximately equal proportion. Here the mean and standard deviation are based on all frames of data between the acoustic vowel onset and offset of the peripheral vowels [i:, e:, a:, o:, u:]. There are two speakers to consider and the \(z\) score normalisation will be carried out separately for each of them. The first step is to obtain the by-speaker mean and standard deviation as follows.
mean_sd <- v.fmt %>%
# Identify the vowels
filter(labels %in% c("i:", "e:", "a:", "o:", "u:")) %>%
# For each speaker:
group_by(speaker) %>%
# compute mean & sd of F1 and F2
summarise(m1 = mean(T1),
s1 = sd(T1),
m2 = mean(T2),
s2 = sd(T2)) %>%
ungroup
mean_sd
## # A tibble: 2 × 5
## speaker m1 s1 m2 s2
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 K67 364. 147. 1525. 500.
## 2 K68 423. 219. 1846. 714.
The vowel normalisation can be carried out in two simple steps. The first involves making use of the dplyr function left_join()
to bind these means and standard deviations into the original data frame v.fmt
by speaker. That is, 4 new columns are created in v.fmt
(those from mean_sd
above) containing the mean and sd values appropriate to that speaker. The next step is to store the \(z\) score normalised formants as the variables F1
and F2
.
# over-write v.fmt with `%<>%`
v.fmt %<>%
# 1. Add the mean & sd values from above
left_join(mean_sd, by = "speaker") %>%
# 2. Create two new columns, F1 and F2
# which are computed from the
# raw formant values in columns T1 and T2
mutate(F1 = (T1 - m1) / s1,
F2 = (T2 - m2) / s2)
A plot of the normalised vowels in the normalised F1 \(\times\) F2 space is now given by:
cols1 = c("black", "red", "blue", "orange")
cols2 = c("magenta", "green", "slategray")
cols = c(cols1, cols2)
v.fmt %>%
# select values at temporal midpoint
filter(times_norm == 0.5) %>%
ggplot +
aes(y = F1, x = F2, col = labels) +
geom_point() +
facet_wrap(~speaker) +
scale_x_reverse(name = "F2 (Standard deviations)") +
scale_y_reverse(name = "F1 (Standard deviations)") +
scale_colour_manual(values = cols)
An alternative methodology proposed by Brand et al, (2021) can be used when the vowel types are imbalanced across the vowel space (e.g. many more front vowels than back vowels). In this case, a grand mean and standard deviation are calculated. The former is the mean of the vowel means calculated earlier; and the latter is the standard deviation of these means. These calculations can be derived in exactly the same way as before and then by calculating these grand mean and standard deviations:
mean_sd <- v.fmt %>%
# subset vowels
filter(labels %in% c("i:", "e:", "a:", "o:", "u:")) %>%
# per speaker
group_by(speaker, labels) %>%
# compute mean & sd of F1 and F2
summarise(m1 = mean(T1),
m2 = mean(T2)) %>%
group_by(speaker) %>%
# calculate grand mean and standard deviation
summarise(gm1 = mean(m1),
gs1 = sd(m1),
gm2 = mean(m2),
gs2 = sd(m2)) %>%
ungroup
## `summarise()` has grouped output by 'speaker'. You can override using the
## `.groups` argument.
## # A tibble: 2 × 5
## speaker gm1 gs1 gm2 gs2
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 K67 370. 142. 1409. 527.
## 2 K68 425. 203. 1696. 724.
The revised \(z\) score normalisation is then:
v.fmt %<>%
# 1. Add the grand mean & sd values from above
left_join(mean_sd, by = "speaker") %>%
# 2. Create two new columns, F1 and F2
# which are computed from the
# raw formant values in columns T1 and T2
mutate(F1g = (T1 - gm1) / gs1,
F2g = (T2 - gm2) / gs2)
And the revised plot:
v.fmt %>%
filter(times_norm == 0.5) %>%
ggplot +
aes(y = F1g, x = F2g, col = labels) +
geom_point() +
facet_wrap(~speaker) +
scale_x_reverse(name = "F2 (Standard deviations)") +
scale_y_reverse(name = "F1 (Standard deviations)") +
scale_colour_manual(values = cols)
The centre of a speaker’s normalised vowel space is by definition the point [0, 0] in a \(z\) score normalised space. Therefore, some estimate of the degree of centralisation can be given by calculating the Euclidean disance to [0, 0]. This distance calculation is simply the square-root of the sum of normalised F1-squared and normalised F2-squared as follows:
It looks from the earlier plot that the male speaker’s [I] (speaker K67
) is more central than that of the female speaker (K68
). Whether this is so can also now be seen as follows:
v.fmt %>%
filter(labels == "I") %>%
filter(times_norm == .5) %>%
ggplot +
aes(y = central, x = speaker) +
geom_boxplot()
but there’s no evidence to suggest this is so. On the other hand, a comparison of tense [i:] with lax [I] should show a greater degree of centralisation in [I] for both speakers.
v.fmt %>%
filter(labels %in% c("I", "i:")) %>%
filter(times_norm == .5) %>%
ggplot +
aes(y = central, x = labels) +
facet_wrap(~speaker) +
geom_boxplot()
The task here is to quantify the degree of overlap between vowel pairs by measuring the inter-Euclidean distances between them. The distances are calculated to the centroid (or mean) of the other vowel. Thus, for a vowel category \(V\) with tokens \(v_1\), \(v_2\) … \(v_n\) and another vowel category \(U\) with tokens \(u_1\), \(u_2\) … \(u_n\), the distances are calculated from \(v_1\), \(v_2\) … \(v_n\) to the centroid of \(U\) and from \(u_1\), \(u_2\) … \(u_n\) to the centroid of \(V\). The vowel pairs in this example are [E, e:]. The further test is of whether, as the earlier plot in the F1 \(\times\) F2 plane suggests, the inter-Euclidean distance between them is less for male speaker K67
than for female speaker K68
. The calculations will be done in the speaker-normalised F1 \(\times\) F2 space. Separate calculations will be made by speaker of:
There are three steps to this.
Calculate the [E, e:] centroids for each speaker at the temporal midpoint of the vowel.
Calculate the distance of all observations in the data frame v.fmt
to each of these centroids.
Make a new data frame that only contains distances of [E] to the [e:]-centroid and of [e:] to the [E]-centroid.
The calculation by speaker in the normalised F1 \(\times\) F2 space of the [E, e:] centroids at the vowel midpoint can be done as follows:
centroids <- v.fmt %>%
# identify the temporal midpoint
filter(times_norm == .5) %>%
# choose the vowels
filter(labels %in% c("E", "e:")) %>%
# for each speaker and vowel
group_by(speaker, labels) %>%
# compute mean F1 and F2
summarise(mF1 = mean(F1),
mF2 = mean(F2)) %>%
# rearrange in columns
pivot_wider(names_from = labels,
values_from = c(mF1, mF2)) %>%
ungroup
## `summarise()` has grouped output by 'speaker'. You can override using the
## `.groups` argument.
## # A tibble: 2 × 5
## speaker mF1_E `mF1_e:` mF2_E `mF2_e:`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 K67 1.07 -0.224 0.243 0.950
## 2 K68 0.680 -0.238 0.432 0.906
Note the column names involving [e:] are in single backward ticks in order not to interpret the colon as a special character. The next step is to join the centroids to the data frame v.fmt
using the function left_join()
as described earlier. This is, once again, done by speaker.
The above adds the centroid data as new columns to v.fmt
but by speaker. Thus, for example, since the 20th observation of v.fmt
contains data for K67
, the observation also has the centroids for that speaker:
## # A tibble: 1 × 5
## speaker mF1_E `mF1_e:` mF2_E `mF2_e:`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 K67 1.07 -0.224 0.243 0.950
The first step is to write a simple function to calculate Euclidean distances between data points.
# calculate Euclidean distance between
# the vectors `a` and `b`
euc = function(a, b) {
sqrt(sum((a - b)^2))
}
The euc()
function can be tested by calculating the distance from c(0, 0)
to c(3, 4)
. This is the length of the longest side of a 3-4-5 right-angled triangle i.e. sqrt(3^2 + 4^2)
and should give 5 in other words the length of the red line in:
plot(0:5, 0:5, type="n", xlab="x", ylab="y")
lines(c(0, 3), c(0, 4), col="red")
lines(c(0, 0), c(0, 4))
lines(c(0, 3), c(4, 4))
which is given by
## [1] 5
The task is now to apply the function euc()
to each observation of v.fmt
: more specifically, to calculate for each observation the Euclidean distance from normalised F1 and F2 to the [E]-centroid and to the [e:]-centroid, as follows:
# calculate distance to [E] centroid
v.fmt %<>%
# for each row i.e. for each observation
rowwise() %>%
# calculate the Euclidean distance to the [E]-centroid...
mutate(dist_E = euc(c(F1, F2), c(mF1_E, mF2_E))) %>%
# ... and to the [e:]-centroid
mutate(dist_e = euc(c(F1, F2), c(`mF1_e:`, `mF2_e:`))) %>%
# always ungroup after rowwise() or group_by()
ungroup
Thus for e.g. the 20th observation, these distances are:
## # A tibble: 1 × 2
## dist_E dist_e
## <dbl> <dbl>
## 1 1.90 0.434
which should be the same as the Euclidean distances from here:
## # A tibble: 1 × 2
## F1 F2
## <dbl> <dbl>
## 1 -0.627 1.11
to here:
## # A tibble: 1 × 2
## mF1_E mF2_E
## <dbl> <dbl>
## 1 1.07 0.243
and to here
## # A tibble: 1 × 2
## `mF1_e:` `mF2_e:`
## <dbl> <dbl>
## 1 -0.224 0.950
as verified by:
## [1] 1.902408
## [1] 0.4340553
The task here is to rearrange the data frame so that:
it includes only the distances from [E] to the [e:]-centroid and from [e:] to the [E]-centroid at the temporal midpoint.
the above is arranged in a single column of distances.
there is another column which is a factor with two levels in order to identify an observation as E_dist
(the distance from [E] tokens to the [e:] centroid) or as e_dist
(the distance from [e:] tokens to the [E] centroid).
The command below first deals with 3. to create a column named vowel
with these two levels. All the distances in 2. are stored in a new column called distances
, thus:
v.fmtlong = v.fmt %>%
# for the columns that begin with dist_
pivot_longer(cols = starts_with("dist_"),
# 3. make a new column called `vowel`
# with levels `dist_E` or `dist_e`
names_to = "vowel",
# 2. make a new column of the distances
# of observations to the [e:]-centroid and of
# to the [E]-centroid
values_to = "distances")
This command below now deals with 1.:
e.df = v.fmtlong %>%
filter((labels == "E" & vowel == "dist_e") |
(labels == "e:" & vowel == "dist_E")) %>%
filter(times_norm == 0.5)
A plot comparing the Euclidean distances between the speakers can now be made:
The left panel has the distances of [E] vowels at the temporal midpoint to the [e:] centroid; the right panel from [e:] vowels at the temporal midpoint to the [E] centroid. It is evident that these distances are greater for K67
(less overlap between [E, e:]) than for K68
.
The Mahalanobis distance is the number of ellipse standard deviations (or ellipsoid standard deviations for higher dimensions) of a point from the centre of the distribution. Consider for example, the position of the point \(X\) which is located roughly at the 95% confidence ellipse:
mdata.df %>%
ggplot +
aes(y = F1, x = F2) +
stat_ellipse(type = "norm") +
annotate("text",
y = -0.9649036, x = 0.5113503,
label = "X", size = 8, col = "red") +
annotate("text",
y = -0.6073229, x = 0.9897430,
label = "O", size = 8, col = "red")
The Euclidean and Mahalanobis distances of \(X\) to the centroid \(O\) is given by:
# the point X
X = c(-0.9649036, x = 0.5113503)
# centroid (the point O)
mdata.cen = mdata.df %>%
summarise(F1 = mean(F1), F2 = mean(F2)) %>%
as.matrix
# or
mdata.cen = apply(mdata.df, 2, mean)
# covariance matrix
mdata.cov = mdata.df %>% var
# Euclidean distance
euc(X, mdata.cen)
## [1] 0.5972634
## [1] 5.994233
As the above shows, the Mahalanobis distance requires the calculation of the covariance matrix for the data used to construct the ellipse. For a two-dimensional space, the covariance matrix consists of the variance of the two parameters in the diagonal and the covariance (shared variance) between the two parameters in the off-diagonal.
## [1] 0.06434925
## [1] 0.107755
## [1] -0.02617008
## F1 F2
## F1 0.06434925 -0.02617008
## F2 -0.02617008 0.10775497
The Mahalanobis distance of any point exactly on the 95% ellipse (which is itself a horizontal slice through a two-dimensional normal distribution) is given by the following for the two-dimensional case:
## [1] 5.991465
which is very close to mahalanobis(X, mdata.cen, mdata.cov)
calculated earlier for the Mahalanobis distance between \(X\) and \(O\).
The earlier command for computing centroids is modified below to include covariance matrices. This requires storing lists in the summarise()
function: this is because a covariance matrix is, as its name suggests, a matrix and not a vector:
centroids <- v.fmt %>%
# identify the temporal midpoint
filter(times_norm == .5) %>%
# choose the vowels
filter(labels %in% c("E", "e:")) %>%
# for each speaker and vowel
group_by(speaker, labels) %>%
# compute mean formants and covariance matrices
# and save them as lists in one cell
summarise(meanF = list(c(mean(F1), mean(F2))),
covar = list(cov(cbind(F1, F2)))) %>%
ungroup() %>%
# rearrange in columns
pivot_wider(names_from = labels,
values_from = c(meanF, covar))
## `summarise()` has grouped output by 'speaker'. You can override using the
## `.groups` argument.
## function (x, ...)
## {
## UseMethod("ungroup")
## }
## <bytecode: 0x10e6e9bc0>
## <environment: namespace:dplyr>
## # A tibble: 2 × 5
## speaker meanF_E `meanF_e:` covar_E `covar_e:`
## <chr> <list> <list> <list> <list>
## 1 K67 <dbl [2]> <dbl [2]> <dbl [2 × 2]> <dbl [2 × 2]>
## 2 K68 <dbl [2]> <dbl [2]> <dbl [2 × 2]> <dbl [2 × 2]>
The next lines modify the earlier code to calculate the Mahalanobis distances:
v.fmt %<>%
# add columns from centroids to v.fmt
left_join(centroids, by = "speaker") %>%
# for each observation
rowwise() %>%
# compute mahal distances to E and e:
# values from cells than contain lists
# must be unlisted!
mutate(
distmah_E = mahalanobis(
c(F1, F2),
meanF_E %>% unlist(),
covar_E %>% unlist()
),
distmah_e = mahalanobis(
c(F1, F2),
`meanF_e:` %>% unlist(),
`covar_e:` %>% unlist()
)) %>%
ungroup()
The following creates a data-frame with columns dtype
specifying whether a particular observation if of a Euclidean or Mahalanobis distance; vowel
to code each observation as a distance to the /E/ or to the /e/ centroid; and distance
with the Euclidean or Mahalanobis distance value:
v.fmtlong2 =
v.fmt %>%
pivot_longer(cols = c(dist_e, dist_E, distmah_e, distmah_E),
names_to = c("dtype", "vowel"),
values_to = "distance",
names_sep = "_") %>%
mutate(dtype = ifelse(dtype == "dist", "Euclidean", "Mahalanobis"),
vowel = ifelse(vowel == "e", "dist_to_e", "dist_to_E"))
A data-frame is now derived from v.fmtlong2
above to include only the distances from [E] to the [e:]-centroid and from [e:] to the [E]-centroid at the temporal midpoint:
e.df2 = v.fmtlong2 %>%
filter((labels == "E" & vowel == "dist_to_e") |
(labels == "e:" & vowel == "dist_to_E")) %>%
filter(times_norm == 0.5)
The following shows the Mahalanobis distance of /E/ to the /e:/-centroid and vice-versa separately by speaker. Because the Mahalanobis distances extend over a lare range (from 0 to 250), the plot shows the log. Mahalanobis distances which compresses the vertical scale:
e.df2 %>%
filter(dtype == "Mahalanobis") %>%
ggplot +
aes(y = log(distance), x = speaker) +
geom_boxplot() +
facet_wrap( ~ vowel) +
ylab("Log. Mahalanobis distance")
This is another useful distance metric used by Stevens et al, (2019) in order to compare the relative position of a catgory \(V\) to two other anchor categories \(U\) and \(W\). Typically, \(V\) is positioned somewhere between \(U\) and \(W\) in an acoustic space. The task is to quantify the relative proximity of \(V\) between the anchor categories. Points that are positioned exactly on the (centroid of) the anchors have values of ±1. Points between the anchors vary between ±1. The function itself is:
orthogonal_projection = function(V, U, W) {
# V, U, W are vectors of the same length
# which must be greater than 1
# U, W are the anchors. Calculate the
# orthogonal projection ratio of V
# to the two anchors
( -2 * ((V-U) %*% (U-W)) / ((U-W) %*% (U-W)) ) - 1
}
where %*%
denotes matrix multiplication. The orthogonal projection ratio is ±1 at the anchors. This can be shown as follows:
## [,1]
## [1,] -1
## [,1]
## [1,] 1
whereas a point exactly half-way between the anchors has a value of zero:
## [,1]
## [1,] 0
Here the task is to calculate the orthogonal projection for [u:] relative to the anchors [i:] and [o:] in the F1 \(\times\) F2 speaker-normalised space at the vowels’ temporal midpoint: this can be used as an index for the degree of [u:]-fronting. The calculation should show that [u:] is relatively much closer to [o:]; there is also some suggestion from the earlier plots in the F1 \(\times\) F2 plane that [u:] may be slightly fronter i.e. closer to [i:] for the male than for the female speaker. The first step is to calculate the anchors i.e. the by-speaker centroids of the two anchors [i:, u:]. The same code that was used earlier can be modified for this purpose:
centroids_high <- v.fmt %>%
# identify the temporal midpoint
filter(times_norm == .5) %>%
# choose the anchor vowels
filter(labels %in% c("i:", "o:")) %>%
# for each speaker and vowel
group_by(speaker, labels) %>%
# compute mean F1 and F2
summarise(highcenF1 = mean(F1),
highcenF2 = mean(F2)) %>%
# rearrange in columns
pivot_wider(names_from = labels,
values_from = c(highcenF1, highcenF2)) %>%
ungroup
## `summarise()` has grouped output by 'speaker'. You can override using the
## `.groups` argument.
## # A tibble: 2 × 5
## speaker `highcenF1_i:` `highcenF1_o:` `highcenF2_i:` `highcenF2_o:`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 K67 -0.607 -0.0121 0.990 -1.51
## 2 K68 -0.641 -0.174 1.06 -1.42
The next step, as before, is to left join the centroids to the original data frame:
The calculation of the orthogonal projection ratio of each observation in v.fmt
to these anchors is now straightforward and uses the same kind of code structure as for calculating Euclidean distances:
v.fmt %<>%
# for each row i.e. for each observation
rowwise() %>%
# calculate orth.proj.ratio...
mutate(orth_high =
orthogonal_projection(
c(F1, F2),
c(`highcenF1_i:`, `highcenF2_i:`),
c(`highcenF1_o:`, `highcenF2_o:`)
)
) %>%
ungroup
Here are boxplots of the distribution of the orthogonal projection ratio for [u:] vowels at the temporal midpoint:
v.fmt %>%
filter(labels == "u:") %>%
filter(times_norm == 0.5) %>%
ggplot +
aes(y = orth_high, x = speaker) +
geom_boxplot()
The boxplots show:
These orthogonal projection ratios are positive which means that [u:] is much closer to [o:] than to [i:], as expected.
There is some marginal evidence that these orthogonal projection ratios are lower for the male speaker K67
than for the female speaker K68
– which suggests the possibility that K67
’s [u:] are slightly fronted. (It is possible that this would emerge even more clearly based on analyses of F2 alone, rather than F1 and F2 combined).