1 Preliminaries

1.1 General setup

Follow the setup instructions given here, i.e. download R and RStudio, create a directory on your computer where you will store files on this course, make a note of the directory path, create an R project that accesses this directory, and install all indicated packages.

For this and subsequent tutorials, access the tidyverse,magrittr, emuR, tools, and wrassp libraries:

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(magrittr)

## 
## Attaching package: 'magrittr'
## 
## The following object is masked from 'package:purrr':
## 
##     set_names
## 
## The following object is masked from 'package:tidyr':
## 
##     extract

library(emuR)

## 
## Attaching package: 'emuR'
## 
## The following object is masked from 'package:base':
## 
##     norm

library(wrassp)
library(tools)
sourceDir = "./testsample"
targetDir = "./emu_databases"

1.2 An excerpt of the Kiel corpus

The following makes use of an extract of read speech from the Kiel corpus. Either download the zip archive, unzip the archive and move kielread_emuDB to the folder emu_databases or execute the following code:

a = "https://www.phonetik.uni-muenchen.de/~jmh/"
b = "lehre/Rdf/kielread_emuDB.zip"
path = paste(a, b, sep="/")
download.file(path, 
              file.path(targetDir, "kielread.zip"))
unzip(file.path(targetDir, "kielread.zip"), exdir = targetDir)

Check that it is there:

"kielread_emuDB" %in% list.files(targetDir)

## [1] TRUE

This should now be TRUE.

1.3 The Praat formant tracker

The module makes use of the Praat formant tracker. For this purpose, some code will be needed from Albin, A. (2014). PraatR: An architecture for controlling the phonetics software “Praat” with the R programming language. Journal of the Acoustical Society of America, 135(4), 2198. You may need to install these packages first:

install.packages("devtools")

Install the required software for the Praat formant tracker:

library(devtools)
install_github("usagi5886/PraatR")

Load the required Praat library and software:

library(PraatR)
url_ips = "https://www.phonetik.uni-muenchen.de/~jmh/lehre/Rdf"
source(file.path(url_ips, "Praatformant.R"))
source(file.path(url_ips, "Praat_times_correct.R"))

You must also make sure you have the latest version of the Praat software installed. If you are a user of Windows, you have to create a copy of Praat.exe in the folder of the PraatR package, i.e. in Dokumente/R/win-library/4.1/PraatR. You can check that this is the case with:

file.exists("~/Dokumente/R/win-library/4.1/PraatR/Praat.exe")

If you are a user of Linux or Mac, the code below should work without errors.

2 Calculating and displaying formants

First load and inspect the database:

kielread_DB = load_emuDB(file.path(targetDir, "kielread_emuDB"))
summary(kielread_DB)

##

## ── Summary of emuDB ────────────────────────────────────────────────────────────

##

## ── Database configuration ──────────────────────────────────────────────────────

##

## ── SSFF track definitions ──

##

## ── Level definitions ──

##

## ── Link definitions ──

##

As the above shows, there are currently no calculated signal files and there are four tiers with various attributes:

list_levelDefinitions(kielread_DB)

##       name    type nrOfAttrDefs                             attrDefNames
## 1     Word    ITEM            2                              Word; Func;
## 2 Syllable    ITEM            1                                Syllable;
## 3  Kanonic    ITEM            2                       Kanonic; SinfoKan;
## 4 Phonetic SEGMENT            4 Phonetic; Autoseg; SinfoPhon; LexAccent;

The segment tier Phonetic has a phonetic segmentation:

serve(kielread_DB, useViewer=F)

and the database consists of the same 100 sentences that have been read by a male (K67) and female (K68) speaker of standard German. The object now is to calculate formant frequencies for these speakers. The simplest way to do this is to make use of the add_ssffTrackDefinition() function with onTheFlyFunctionName = forest as detailed in the previous module here.

This will run the formant tracker with the same set of parameters over all the data. However, the problem here is that there is a male and a female speaker whose formant data should be calculated separately using the argument gender = in the forest() function for calculating formants. Therefore, the simple and quick option using add_ssffTrackDefinition() is not appropriate here. Instead, the .wav files need to be identified separately for each speaker and then the forest function needs to be applied to these directly in the same way that was used for calculating pitch data in an earlier module.

2.1 Speaker-specific calculation of formants

The first step is to make a vector of the .wav files in the database. These are here:

kiel_wav_paths = list.files(file.path(targetDir, "kielread_emuDB"),
                            pattern = ".*wav$",
                            recursive = T,
                            full.names = T)

For these data, the speaker ID is encoded in the filename as shown inspecting any one of them:

kiel_wav_paths[1]

## [1] "./emu_databases/kielread_emuDB/0000_ses/K67MR001_bndl/K67MR001.wav"

The simplest way to get at the speaker ID relative to these wav paths is to count the number of characters for each element in wav_files:

n = nchar(kiel_wav_paths)

and then to extract from this vector the characters 11 through 9 counting back from the last character:

speaker = substring(kiel_wav_paths, n-11, n-9)
table(speaker)

## speaker
## K67 K68 
## 100 100

which identifies the two speakers K67 and K68.

The next step is to run the formant tracker separately over the data from these two speakers and set the argument gender= accordingly.

# a logical vector identifying the male speaker
temp.m = speaker == "K67"
# run the formant tracker over that speaker's `.wav` files
forest(kiel_wav_paths[temp.m], gender = "m")
# the same for the female speaker but with `gender` set to `f`
temp.f = speaker == "K68"
forest(kiel_wav_paths[temp.f], gender = "f")

Notice how using the above function by leaving the argument outputDirectory = NULL puts the corresponding formant file into the appropriate bundle. For example, the files associated with the first bundle:

list.files(file.path(targetDir, 
                     "kielread_emuDB",
                     "0000_ses",
                     "K67MR001_bndl"))

## [1] "K67MR001_annot.json" "K67MR001.fms"        "K67MR001.wav"

now contains the signal file that has the formant data K67MR001.fms for that utterance (i.e. with the same basename K67MR001).

2.2 Adding the formants to the database

The formant files now need to be loaded into the kielread_DB database using the function add_ssffTrackDefinition() as described in an earlier module here. The three arguments to this function are:

the name of the signal. If manual formant correction is to be carried out, then the only choice is name the formant data FORMANTS as described in further detail in ‘Frequency-aligned formant contours spectrogram overlay’ here
the column name. The possible choices are:

wrasspOutputInfos$forest$tracks

## [1] "fm" "bw"

In this case fm for formant centre frequencies.

the file extension which is given here:

wrasspOutputInfos$forest$ext

## [1] "fms"

Putting this together, the required command to add the formant signals to kielread_DB in such a way that they can be subsequently corrected is:

add_ssffTrackDefinition(kielread_DB, 
                        "FORMANTS", "fm", "fms")

2.3 Superimposing the formants on the spectrogram

There is now the issue of how to display formants overlaid on the spectrogram. To do this, the procedure is very similar to the one described for superimposing f0 described in an earlier module here which requires editing the database’s DBconfig.json file that is located here:

list.files(kielread_DB$basePath)

## [1] "0000_ses"                   "kielread_DBconfig.json"    
## [3] "kielread_emuDBcache.sqlite"

The procedure is:

Optionally make a backup copy of kielread_DBconfig.json in case anything goes wrong.
Open kielread_DBconfig.json with a plain text editor.
Search for "assign"
Carefully replace

"assign": [],

with

"assign": [{ "signalCanvasName": "SPEC", "ssffTrackName": "FORMANTS" }],

Save kielread_DBconfig.json

2.4 Manual correction of formants

It should now be possible to correct the formants manually. Start by opening the database:

serve(kielread_DB, useViewer=F)

and zoom in for demonstration purposes at the onset of the first utterance. Pressing any of 1, 2, 3, 4 on the keyboard turns the corresponding formant yellow which indicates that it can be hand-corrected. Hold down the shift key and sweep the mouse to the left or to the right (without clicking) to change the corresponding formant as in the figure below; and save the changes, if appropriate.

2.5 The F1 × F2 space for vowels

The task here is to inspect some peripheral vowels in the formant plane with decreasing F1 on the y-axis (which is proportional to phonetic height) and decreasing F2 on the x-axis (which is proportional to phonetic backness). The vowels to be analysed are the German tense peripheral vowels [i:, e:, a:, o:, u:] from the Phonetic tier. The formant data will be extracted at these vowels’ temporal midpoint. As discussed in the preceding module here.

There are various ways to do this. The one chosen here is to extract the formant data from the acoustic onset to the acoustic offset, then time normalise to 11 data points, and then extract the formant data at proportional time point 0.5. The commands including making the segment list are as follows:

# make the segment list
v.s = query(kielread_DB, "Phonetic = i:|e:|a:|o:|u:")
# there are 281 of these
dim(v.s)
# with this distribution
table(v.s$labels)
# get the formants
v.fm = get_trackdata(kielread_DB, v.s, "FORMANTS")
# time-normalise them to 11 data points
v.fmt = normalize_length(v.fm, N = 11)

At this point, it is a good idea to identify the speaker, especially since the male and female speakers’ data should be separately plotted. As discussed earlier, the speaker ID is given by the first three characters in the utterance name and hence in the first three characters of v.fmt$bundle.

head(v.fmt$bundle)

## [1] "K67MR002" "K67MR002" "K67MR002" "K67MR002" "K67MR002" "K67MR002"

The following makes a new column speaker in the trackdata object with this information:

v.fmt %<>% mutate(speaker = substring(v.fmt$bundle, 1, 3))
table(v.fmt$speaker)

## 
##  K67  K68 
## 1551 1540

The vowel data at the temporal midpoint can now be given by:

v.fmt5 = v.fmt %>% 
  filter(times_norm == 0.5)

The number of observations in v.fmt5 should be the same as those in the segment list from which these data were originally derived:

nrow(v.fmt5) == nrow(v.s)

## [1] TRUE

The commands for the formant plane plot colour-coded by vowel type and displayed separately for the two speakers are:

v.fmt5 %>%
  ggplot +
  aes(y = T1, x = T2, col = labels) +
  geom_point() +
  facet_wrap(~speaker)

The following reverses the axes so that the plot is proportional to vowel height and backness and chooses more distinctive colours:

cols = c("black", "red", "blue", "orange", "magenta")
v.fmt5 %>%
  ggplot +
  aes(y = T1, x = T2, col = labels) +
  geom_point() +
  facet_wrap(~speaker) +
  scale_x_reverse(name = "F2 (Hz)") +
  scale_y_reverse(name = "F1 (Hz)") +
  scale_colour_manual(values = cols)

It is clear from the figure that there are some formant tracking errors for the female speaker K68, in particular:

there is an [a:] data point with a very low F1 and high F2 in the bottom left corner.
there is another [a:] data point with F1 at 400 Hz.
there are a total of three [i:] and [e:] data points for which F2 is less than 1500 Hz.

The segments corresponding to these can be identified as follows:

outlier.df = v.fmt5 %>%
  filter(speaker == "K68") %>%
  filter((T1 > 1400 & labels == "a:") |
           (T1 < 400 & labels == "a:") |
           (T2 < 1500 & labels %in% c("i:", "e:")))
nrow(outlier.df)

## [1] 5

These five outliers can be passed to the serve() function using the seglist argument as follows:

serve(kielread_DB, seglist = outlier.df, useViewer = F)

After correcting these errors, the formants and time-normalisation will have to be carried out again:

# extract formant data again
v.fm = get_trackdata(kielread_DB, v.s, "FORMANTS")
# time-normalise to 11 data points
v.fmt = normalize_length(v.fm, N = 11)
# speaker info
v.fmt %<>% mutate(speaker = substring(v.fmt$bundle, 1, 3))
# extract at midpoint
v.fmt5 = v.fmt %>% filter(times_norm == 0.5)
# plot - this time with ellipses 
v.fmt5 %>%
  ggplot +
  aes(y = T1, x = T2, col = labels) +
  geom_point() +
  facet_wrap(~speaker) +
  scale_x_reverse(name = "F2 (Hz)") +
  scale_y_reverse(name = "F1 (Hz)") +
  scale_colour_manual(values = cols) + 
  stat_ellipse()

2.6 Making use of the Praat formant tracker

Using the Praat formant tracker is more fiddly because it is not automated in the same way as other signal processing routines of the wrassp package. Nevertheless, using it is recommended since experience at the IPS in recent years is that it tends to make fewer errors than the forest formant tracker in wrassp.

Calculating the Praat formants requires first identifying the full paths of the .wav files, thus:

kiel_wav_paths2 <- list.files(kielread_DB$basePath, 
                              pattern = glob2rx("*.wav"), 
                              full.names = TRUE, 
                              recursive = TRUE)

These paths must not contain spaces! This is because PraatToFormants2AsspDataObj() which is used later will throw an error otherwise. In case the directory to the R project contains spaces, you may have to use the following workaround:

Copy the database kielread_emuDB from emu_databases to a directory which has no spaces in its name or path, e.g. ~/Downloads.
Load this database with load_emuDB(), e.g. kielread_DB <- load_emuDB(file.path("~/Downloads", "kielread_emuDB"))
Re-run the code snippet above to create kiel_wav_paths2 and then run the code in the following 5 code snippets.
Copy the updated database to the original directory emu_databases to replace the old database.
Re-load the database in emu_databases with kielread_DB <- load_emuDB(file.path(targetDir, "kielread_emuDB"))

The derivation of the speaker information from these paths is exactly the same as before (and therefore, need not be repeated, if the above commands have been run):

n = nchar(kiel_wav_paths2)
speaker = substring(kiel_wav_paths2, n-11, n-9)
table(speaker)

## speaker
## K67 K68 
## 100 100

The Praat formant tracker will now be run separately on data for the male speaker K67 and for the female speaker K68.

The command for doing so is PraatToFormants2AsspDataObj(). The argument known as argument in this function can be manipulated. Its default is c(0, 5, 5000, 0.025, 50) and has the following meanings:

Time step (0): the time between the centres of consecutive frames. The default is 0 which uses a time step equal to 25% of the window length (see below).
The maximum number of formants to be extracted (5): the default here is 5 which according to the Praat manual “is the only way in which this procedure will give you results comparable with how people tend to interpret formants for vowels, i.e. in terms of vowel height (F1) and vowel place (F2)”.
The formant ceiling (5000): This is the maximum frequency range of the formant search range in Hz. For male speakers the recommendation (from the Praat manual) is 5000 Hz and for female speakers 5500 Hz.
Window length (0.025): the effective duration of the analysis window within which formants are calculated and which gives a frequency resolution of 51.9 Hz
Preemphasis (50): enhancement of the spectrum by 6dB/octave above 100 Hz to give a flatter spectrum for vowels i.e. to counteract the usual 6dB/octave drop off in vowel spectra.

From the above, it becomes clear that argument should be set with a formant ceiling of 5000 Hz (the default, hence no specification is needed) for the male speaker K67 and with a ceiling of 5500 Hz for the female speaker K68. First, the Praat formants are computed for the male speaker. Notice this requires a for-loop to be run over each of the utterances of this speaker, as detailed in the comments below. The effect is to store the Praat formants with extension in .praatFms in the corresponding bundle. (It takes a minute or so).

# a logical vector identifying the male speaker
temp.m = speaker == "K67"
# run the formant tracker over that
# speaker's `.wav` files
for (fp in kiel_wav_paths2[temp.m]) {
  # use function for current wav file
  ado = PraatToFormants2AsspDataObj(fp) 
  # create new path (stores files in same bundle)
  newPath = paste0(file_path_sans_ext(fp), ".praatFms")
  # print(paste0(fp, ' -> ', newPath))
  # write asspDataObject to SSFF file 
  write.AsspDataObj(ado, file = newPath)
}

The same is done for the female spaker, but with the arguments modified so that the maximum frequency is set to 5500 Hz (again allow 1-2 minutes for this to run).

# a logical vector identifying the male speaker
temp.f = speaker == "K68"
# run the formant tracker over that
# speaker's `.wav` files
for (fp in kiel_wav_paths2[temp.f]) {
  # use function for current wav file
  ado = PraatToFormants2AsspDataObj(
    fp, 
    arguments = list(0.0, 5, 5500, 0.025, 50)
  ) 
  # create new path (stores files in same bundle)
  newPath = paste0(file_path_sans_ext(fp), ".praatFms")
  # print(paste0(fp, ' -> ', newPath))
  # write asspDataObject to SSFF file 
  write.AsspDataObj(ado, file = newPath)
}

There are some further fiddly alignment issues that may need to be adjusted. This can be done with the function below that makes any necessary adjustments for all .praatFMS files it finds in the database.

Praat_times_correct(kielread_DB$basePath)

Finally, the Praat formant data can be added to the database in the usual way. This is done in this case by naming the signal Praat as shown below:

add_ssffTrackDefinition(kielread_DB, 
                        name = "Praat", 
                        columnName = "fm", 
                        fileExtension = "praatFms")

So now there should be two sets of formant tracks in this database:

list_ssffTrackDefinitions(kielread_DB)

The Praat formant data can now be extracted in the same way as before. The second command also time-normalises these data to 11 data points.

v.praatfm = get_trackdata(kielread_DB, v.s, "Praat")
# time-normalise them to 11 data points
v.praatfmt = normalize_length(v.praatfm, N = 11)

The number of observations in the formant data extracted from forest and extracted from the Praat formant tracker should be the same:

nrow(v.fmt) == nrow(v.praatfmt)

Consequently, the Praat formants can be added as columns to the time-normalized data-frame obtained earlier using the forest formant data. Here only the first two Praat formant frequencies are added and are given column names P1 and P2:

v.fmt %<>% 
  mutate(P1 = v.praatfmt$T1, 
         P2 = v.praatfmt$T2)

These data are once again extracted at the temporal midpoint:

v.fmt5 = v.fmt %>% 
  filter(times_norm == 0.5)

And here finally is a plot in the formant plane using the Praat formant data:

# plot Praat formants
v.fmt5 %>%
  ggplot +
  aes(y = P1, x = P2, col = labels) +
  geom_point() +
  facet_wrap(~speaker) +
  scale_x_reverse(name = "F2 (Hz)") +
  scale_y_reverse(name = "F1 (Hz)") +
  scale_colour_manual(values = cols)

from which it is evident that there are fewer formant tracking errors than the data extracted earlier with the forest function.

A direct comparison of forest and Praat formant tracks can be made by segment, in this example for the 10th segment and F2 (the Praat F2 is in red, the forest F2 in black):

v.fmt %>%
  filter(sl_rowIdx == 10) %>%
  ggplot +
  aes(x = times_norm) +
  geom_line(aes(y = P2), colour = "red") +
  geom_line(aes(y = T2)) +
  xlab("Proportional time") +
  ylab("F2 (Hz)")

The default display is still set for displaying and correcting the formant frequencies derived from forest. In order to change this to display and correct Praat formants, the track definitions of both first need to be removed:

remove_ssffTrackDefinition(kielread_DB, 
                           name = "FORMANTS")
remove_ssffTrackDefinition(kielread_DB, 
                           name = "Praat")

and then added back in but with different names. To display and correct the Praat formants, the name of the track must be set to FORMANTS; the user can then choose any name for the formant tracks derived with forest, thus:

add_ssffTrackDefinition(kielread_DB, 
                        name = "FORMANTS", 
                        columnName = "fm", 
                        fileExtension = "praatFms")
add_ssffTrackDefinition(kielread_DB, 
                        "forest", "fm", "fms")

The display has now been set to display and to edit Praat formants:

serve(kielread_DB, useViewer=F)

Automatic formant tracking and manual correction

Jonathan Harrington

2023