Follow the setup instructions given here, i.e. download R and RStudio, create a directory on your computer where you will store files on this course, make a note of the directory path, create an R project that accesses this directory, and install all indicated packages.
For this and subsequent tutorials, access the tidyverse
,magrittr
, emuR
, tools
, and wrassp
libraries:
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'magrittr'
##
## The following object is masked from 'package:purrr':
##
## set_names
##
## The following object is masked from 'package:tidyr':
##
## extract
##
## Attaching package: 'emuR'
##
## The following object is masked from 'package:base':
##
## norm
The following makes use of an extract of read speech from the Kiel corpus. Either download the zip archive, unzip the archive and move kielread_emuDB
to the folder emu_databases
or execute the following code:
a = "https://www.phonetik.uni-muenchen.de/~jmh/"
b = "lehre/Rdf/kielread_emuDB.zip"
path = paste(a, b, sep="/")
download.file(path,
file.path(targetDir, "kielread.zip"))
unzip(file.path(targetDir, "kielread.zip"), exdir = targetDir)
Check that it is there:
## [1] TRUE
This should now be TRUE
.
The module makes use of the Praat formant tracker. For this purpose, some code will be needed from Albin, A. (2014). PraatR: An architecture for controlling the phonetics software “Praat” with the R programming language. Journal of the Acoustical Society of America, 135(4), 2198. You may need to install these packages first:
Install the required software for the Praat formant tracker:
Load the required Praat library and software:
library(PraatR)
url_ips = "https://www.phonetik.uni-muenchen.de/~jmh/lehre/Rdf"
source(file.path(url_ips, "Praatformant.R"))
source(file.path(url_ips, "Praat_times_correct.R"))
You must also make sure you have the latest version of the Praat software installed. If you are a user of Windows, you have to create a copy of Praat.exe
in the folder of the PraatR
package, i.e. in Dokumente/R/win-library/4.1/PraatR
. You can check that this is the case with:
If you are a user of Linux or Mac, the code below should work without errors.
First load and inspect the database:
##
## ── Summary of emuDB ────────────────────────────────────────────────────────────
##
## ── Database configuration ──────────────────────────────────────────────────────
##
## ── SSFF track definitions ──
##
## ── Level definitions ──
##
## ── Link definitions ──
##
As the above shows, there are currently no calculated signal files and there are four tiers with various attributes:
## name type nrOfAttrDefs attrDefNames
## 1 Word ITEM 2 Word; Func;
## 2 Syllable ITEM 1 Syllable;
## 3 Kanonic ITEM 2 Kanonic; SinfoKan;
## 4 Phonetic SEGMENT 4 Phonetic; Autoseg; SinfoPhon; LexAccent;
The segment tier Phonetic
has a phonetic segmentation:
and the database consists of the same 100 sentences that have been read by a male (K67
) and female (K68
) speaker of standard German. The object now is to calculate formant frequencies for these speakers. The simplest way to do this is to make use of the add_ssffTrackDefinition()
function with onTheFlyFunctionName = forest
as detailed in the previous module here.
This will run the formant tracker with the same set of parameters over all the data. However, the problem here is that there is a male and a female speaker whose formant data should be calculated separately using the argument gender =
in the forest()
function for calculating formants. Therefore, the simple and quick option using add_ssffTrackDefinition()
is not appropriate here. Instead, the .wav
files need to be identified separately for each speaker and then the forest
function needs to be applied to these directly in the same way that was used for calculating pitch data in an earlier module.
The first step is to make a vector of the .wav
files in the database. These are here:
kiel_wav_paths = list.files(file.path(targetDir, "kielread_emuDB"),
pattern = ".*wav$",
recursive = T,
full.names = T)
For these data, the speaker ID is encoded in the filename as shown inspecting any one of them:
## [1] "./emu_databases/kielread_emuDB/0000_ses/K67MR001_bndl/K67MR001.wav"
The simplest way to get at the speaker ID relative to these wav paths is to count the number of characters for each element in wav_files
:
and then to extract from this vector the characters 11 through 9 counting back from the last character:
## speaker
## K67 K68
## 100 100
which identifies the two speakers K67
and K68
.
The next step is to run the formant tracker separately over the data from these two speakers and set the argument gender=
accordingly.
# a logical vector identifying the male speaker
temp.m = speaker == "K67"
# run the formant tracker over that speaker's `.wav` files
forest(kiel_wav_paths[temp.m], gender = "m")
# the same for the female speaker but with `gender` set to `f`
temp.f = speaker == "K68"
forest(kiel_wav_paths[temp.f], gender = "f")
Notice how using the above function by leaving the argument outputDirectory = NULL
puts the corresponding formant file into the appropriate bundle. For example, the files associated with the first bundle:
## [1] "K67MR001_annot.json" "K67MR001.fms" "K67MR001.wav"
now contains the signal file that has the formant data K67MR001.fms
for that utterance (i.e. with the same basename K67MR001
).
The formant files now need to be loaded into the kielread_DB database using the function add_ssffTrackDefinition()
as described in an earlier module here. The three arguments to this function are:
the name of the signal. If manual formant correction is to be carried out, then the only choice is name the formant data FORMANTS
as described in further detail in ‘Frequency-aligned formant contours spectrogram overlay’ here
the column name. The possible choices are:
## [1] "fm" "bw"
In this case fm
for formant centre frequencies.
## [1] "fms"
Putting this together, the required command to add the formant signals to kielread_DB
in such a way that they can be subsequently corrected is:
There is now the issue of how to display formants overlaid on the spectrogram. To do this, the procedure is very similar to the one described for superimposing f0
described in an earlier module here which requires editing the database’s DBconfig.json
file that is located here:
## [1] "0000_ses" "kielread_DBconfig.json"
## [3] "kielread_emuDBcache.sqlite"
The procedure is:
Optionally make a backup copy of kielread_DBconfig.json
in case anything goes wrong.
Open kielread_DBconfig.json
with a plain text editor.
Search for "assign"
Carefully replace
"assign": [],
with
"assign": [{ "signalCanvasName": "SPEC", "ssffTrackName": "FORMANTS" }],
kielread_DBconfig.json
It should now be possible to correct the formants manually. Start by opening the database:
and zoom in for demonstration purposes at the onset of the first utterance. Pressing any of 1, 2, 3, 4 on the keyboard turns the corresponding formant yellow which indicates that it can be hand-corrected. Hold down the shift key and sweep the mouse to the left or to the right (without clicking) to change the corresponding formant as in the figure below; and save the changes, if appropriate.
The task here is to inspect some peripheral vowels in the formant plane with decreasing F1 on the y-axis (which is proportional to phonetic height) and decreasing F2 on the x-axis (which is proportional to phonetic backness). The vowels to be analysed are the German tense peripheral vowels [i:, e:, a:, o:, u:] from the Phonetic
tier. The formant data will be extracted at these vowels’ temporal midpoint. As discussed in the preceding module here.
There are various ways to do this. The one chosen here is to extract the formant data from the acoustic onset to the acoustic offset, then time normalise to 11 data points, and then extract the formant data at proportional time point 0.5. The commands including making the segment list are as follows:
# make the segment list
v.s = query(kielread_DB, "Phonetic = i:|e:|a:|o:|u:")
# there are 281 of these
dim(v.s)
# with this distribution
table(v.s$labels)
# get the formants
v.fm = get_trackdata(kielread_DB, v.s, "FORMANTS")
# time-normalise them to 11 data points
v.fmt = normalize_length(v.fm, N = 11)
At this point, it is a good idea to identify the speaker, especially since the male and female speakers’ data should be separately plotted. As discussed earlier, the speaker ID is given by the first three characters in the utterance name and hence in the first three characters of v.fmt$bundle
.
## [1] "K67MR002" "K67MR002" "K67MR002" "K67MR002" "K67MR002" "K67MR002"
The following makes a new column speaker
in the trackdata object with this information:
##
## K67 K68
## 1551 1540
The vowel data at the temporal midpoint can now be given by:
The number of observations in v.fmt5
should be the same as those in the segment list from which these data were originally derived:
## [1] TRUE
The commands for the formant plane plot colour-coded by vowel type and displayed separately for the two speakers are:
The following reverses the axes so that the plot is proportional to vowel height and backness and chooses more distinctive colours:
cols = c("black", "red", "blue", "orange", "magenta")
v.fmt5 %>%
ggplot +
aes(y = T1, x = T2, col = labels) +
geom_point() +
facet_wrap(~speaker) +
scale_x_reverse(name = "F2 (Hz)") +
scale_y_reverse(name = "F1 (Hz)") +
scale_colour_manual(values = cols)
It is clear from the figure that there are some formant tracking errors for the female speaker K68
, in particular:
The segments corresponding to these can be identified as follows:
outlier.df = v.fmt5 %>%
filter(speaker == "K68") %>%
filter((T1 > 1400 & labels == "a:") |
(T1 < 400 & labels == "a:") |
(T2 < 1500 & labels %in% c("i:", "e:")))
nrow(outlier.df)
## [1] 5
These five outliers can be passed to the serve()
function using the seglist
argument as follows:
After correcting these errors, the formants and time-normalisation will have to be carried out again:
# extract formant data again
v.fm = get_trackdata(kielread_DB, v.s, "FORMANTS")
# time-normalise to 11 data points
v.fmt = normalize_length(v.fm, N = 11)
# speaker info
v.fmt %<>% mutate(speaker = substring(v.fmt$bundle, 1, 3))
# extract at midpoint
v.fmt5 = v.fmt %>% filter(times_norm == 0.5)
# plot - this time with ellipses
v.fmt5 %>%
ggplot +
aes(y = T1, x = T2, col = labels) +
geom_point() +
facet_wrap(~speaker) +
scale_x_reverse(name = "F2 (Hz)") +
scale_y_reverse(name = "F1 (Hz)") +
scale_colour_manual(values = cols) +
stat_ellipse()
Using the Praat formant tracker is more fiddly because it is not automated in the same way as other signal processing routines of the wrassp
package. Nevertheless, using it is recommended since experience at the IPS in recent years is that it tends to make fewer errors than the forest
formant tracker in wrassp
.
Calculating the Praat formants requires first identifying the full paths of the .wav
files, thus:
kiel_wav_paths2 <- list.files(kielread_DB$basePath,
pattern = glob2rx("*.wav"),
full.names = TRUE,
recursive = TRUE)
These paths must not contain spaces! This is because PraatToFormants2AsspDataObj()
which is used later will throw an error otherwise. In case the directory to the R project contains spaces, you may have to use the following workaround:
kielread_emuDB
from emu_databases
to a directory which has no spaces in its name or path, e.g. ~/Downloads
.load_emuDB()
, e.g. kielread_DB <- load_emuDB(file.path("~/Downloads", "kielread_emuDB"))
kiel_wav_paths2
and then run the code in the following 5 code snippets.emu_databases
to replace the old database.emu_databases
with kielread_DB <- load_emuDB(file.path(targetDir, "kielread_emuDB"))
The derivation of the speaker information from these paths is exactly the same as before (and therefore, need not be repeated, if the above commands have been run):
## speaker
## K67 K68
## 100 100
The Praat formant tracker will now be run separately on data for the male speaker K67
and for the female speaker K68
.
The command for doing so is PraatToFormants2AsspDataObj()
. The argument known as argument
in this function can be manipulated. Its default is c(0, 5, 5000, 0.025, 50) and has the following meanings:
Time step (0): the time between the centres of consecutive frames. The default is 0 which uses a time step equal to 25% of the window length (see below).
The maximum number of formants to be extracted (5): the default here is 5 which according to the Praat manual “is the only way in which this procedure will give you results comparable with how people tend to interpret formants for vowels, i.e. in terms of vowel height (F1) and vowel place (F2)”.
The formant ceiling (5000): This is the maximum frequency range of the formant search range in Hz. For male speakers the recommendation (from the Praat manual) is 5000 Hz and for female speakers 5500 Hz.
Window length (0.025): the effective duration of the analysis window within which formants are calculated and which gives a frequency resolution of 51.9 Hz
Preemphasis (50): enhancement of the spectrum by 6dB/octave above 100 Hz to give a flatter spectrum for vowels i.e. to counteract the usual 6dB/octave drop off in vowel spectra.
From the above, it becomes clear that
argument
should be set with a formant ceiling of 5000 Hz (the default, hence no specification is needed) for the male speakerK67
and with a ceiling of 5500 Hz for the female speakerK68
. First, the Praat formants are computed for the male speaker. Notice this requires afor-loop
to be run over each of the utterances of this speaker, as detailed in the comments below. The effect is to store the Praat formants with extension in.praatFms
in the corresponding bundle. (It takes a minute or so).
# a logical vector identifying the male speaker
temp.m = speaker == "K67"
# run the formant tracker over that
# speaker's `.wav` files
for (fp in kiel_wav_paths2[temp.m]) {
# use function for current wav file
ado = PraatToFormants2AsspDataObj(fp)
# create new path (stores files in same bundle)
newPath = paste0(file_path_sans_ext(fp), ".praatFms")
# print(paste0(fp, ' -> ', newPath))
# write asspDataObject to SSFF file
write.AsspDataObj(ado, file = newPath)
}
The same is done for the female spaker, but with the arguments modified so that the maximum frequency is set to 5500 Hz (again allow 1-2 minutes for this to run).
# a logical vector identifying the male speaker
temp.f = speaker == "K68"
# run the formant tracker over that
# speaker's `.wav` files
for (fp in kiel_wav_paths2[temp.f]) {
# use function for current wav file
ado = PraatToFormants2AsspDataObj(
fp,
arguments = list(0.0, 5, 5500, 0.025, 50)
)
# create new path (stores files in same bundle)
newPath = paste0(file_path_sans_ext(fp), ".praatFms")
# print(paste0(fp, ' -> ', newPath))
# write asspDataObject to SSFF file
write.AsspDataObj(ado, file = newPath)
}
There are some further fiddly alignment issues that may need to be adjusted. This can be done with the function below that makes any necessary adjustments for all .praatFMS
files it finds in the database.
Finally, the Praat formant data can be added to the database in the usual way. This is done in this case by naming the signal Praat
as shown below:
So now there should be two sets of formant tracks in this database:
The Praat formant data can now be extracted in the same way as before. The second command also time-normalises these data to 11 data points.
v.praatfm = get_trackdata(kielread_DB, v.s, "Praat")
# time-normalise them to 11 data points
v.praatfmt = normalize_length(v.praatfm, N = 11)
The number of observations in the formant data extracted from forest
and extracted from the Praat
formant tracker should be the same:
Consequently, the Praat formants can be added as columns to the time-normalized data-frame obtained earlier using the forest
formant data. Here only the first two Praat formant frequencies are added and are given column names P1
and P2
:
These data are once again extracted at the temporal midpoint:
And here finally is a plot in the formant plane using the Praat
formant data:
# plot Praat formants
v.fmt5 %>%
ggplot +
aes(y = P1, x = P2, col = labels) +
geom_point() +
facet_wrap(~speaker) +
scale_x_reverse(name = "F2 (Hz)") +
scale_y_reverse(name = "F1 (Hz)") +
scale_colour_manual(values = cols)
from which it is evident that there are fewer formant tracking errors than the data extracted earlier with the forest
function.
A direct comparison of forest
and Praat
formant tracks can be made by segment, in this example for the 10th segment and F2 (the Praat F2 is in red, the forest
F2 in black):
v.fmt %>%
filter(sl_rowIdx == 10) %>%
ggplot +
aes(x = times_norm) +
geom_line(aes(y = P2), colour = "red") +
geom_line(aes(y = T2)) +
xlab("Proportional time") +
ylab("F2 (Hz)")
The default display is still set for displaying and correcting the formant frequencies derived from forest
. In order to change this to display and correct Praat
formants, the track definitions of both first need to be removed:
remove_ssffTrackDefinition(kielread_DB,
name = "FORMANTS")
remove_ssffTrackDefinition(kielread_DB,
name = "Praat")
and then added back in but with different names. To display and correct the Praat
formants, the name of the track must be set to FORMANTS
; the user can then choose any name for the formant tracks derived with forest
, thus:
add_ssffTrackDefinition(kielread_DB,
name = "FORMANTS",
columnName = "fm",
fileExtension = "praatFms")
add_ssffTrackDefinition(kielread_DB,
"forest", "fm", "fms")
The display has now been set to display and to edit Praat formants: