We will use the EMU-SDMS-demo-database which we had created last week:

# load packages
# create path to demo database
path2ae = file.path(mypath, "emuR_demoData", "ae_emuDB")
# load database
ae = load_emuDB(path2ae, verbose = F)

In the last chapter, we were interested in querying the rather complex hierarchy that can be seen at “Link definitions”. Today, we are interested in derived signals, i.e. in signals dervied from the signal, with which every utterance is associated with (this is usually an audio recording, but we could also have EMA-data, eletropalatographic data, and so on, possibly without any audio data at all). In an EMU-SMDS-data-base, derived data is stored in the so-called SSFF file format. SSFF stands for Simple Signal File Format. In the so-called SSFF track definitions, we can see two definitions, one for so-called dft-data, one for fm-data. To see these definitions, we could also have typed:

Be informed, that these data are signals derived from the audio data, and represent spectral analyses in the dft-tracks and calculated formants (and their bandwidths) in the fm-tracks. Keep in mind that every signal except audio data is stored in this format, so e.g. the EMA-tracks in the demo-data-bases ema or the electropalatographic data in the demo-data-base epgdorsal are stored in the SSFF format. See e.g. the aforementioned demo-data-bases in https://ips-lmu.github.io/EMU-webApp/ (open demo – ema/epgdorsal).

As the SSFF track definition informs us, these tracks are obviously saved in extra files, defined by their fileExtension, e.g. ‘dft’ or ‘fms’. If we open one of these files, we will notice, that we cannot see the data directly (because it is binary data), but we can read the header, e.g. in the case of the fms-files:

This means, that every fms-file contains four columns with formant values (recorded every 5 ms - as can be derived from the Record_Freq information) alongside with 4 columns which contain the bandwidths of the first four formants. We can read in these data into R with the following commands. First of all, we have to do a query, e.g. for the vowel /i:/, and then use the command get_trackdata():

# query loaded "ae" emuDB for all "i:" segments of the "Phonetic" level
sl = query(emuDBhandle = ae, 
           query = "Phonetic == i:")

# get the corresponding formant trackdata
ae_i_fm = get_trackdata(emuDBhandle = ae, 
                   seglist = sl, 
                   ssffTrackName = "fm")

In case there aren’t any pre-calculated derived signals, we will have to calculate these. We can do this on-the-fly (within the get_trackdata() function), or we can precalculate derived signals (in the function add_ssffTrackDefinition()). In both cases, we will actually use the package wrassp. wrassp was installed, when we installed the package emuR. The functions of wrassp can be used independently of emuR, but it is way easier to use the integration of wrassp functions into functions of package emuR.

The R package wrassp1

This passage gives an overview and introduction to the wrassp package. The wrassp package is a wrapper for R around Michel Scheffers’ libassp (Advanced Speech Signal Processor). The libassp library and therefore the wrassp package provide functionality for handling speech signales in most common audio formats and for performing signal analyses common in the phonetic and speech sciences. As such, wrassp fills a gap in the R package landscape as, to our knowledge, no previous packages provided this specialized functionality. The currently available signal processing functions provided by wrassp are:

Command Meaning
acfana() Analysis of short-term autocorrelation function
afdiff() Computes the first difference of the signal
affilter() Filters the audio signal (e.g., low-pass and high-pass)
cepstrum() Short-term cepstral analysis
cssSpectrum() Cepstral smoothed version of dftSpectrum()
dftSpectrum() Short-term DFT spectral analysis
forest() Formant estimation
ksvF0() F0 analysis of the signal
lpsSpectrum() Linear predictive smoothed version of dftSpectrum()
mhsF0() Pitch analysis of the speech signal using Michel Scheffers’ Modified Harmonic Sieve algorithm
rfcana() Linear prediction analysis
rmsana() Analysis of short-term Root Mean Square amplitude
zcrana() Analysis of the averages of the short-term positive and negative zero-crossing rates

As mentioned earlier, you could use these functions independently of the emuR package; however, we would like to advise you to use them as parameters in the emuR functions get_trackdata() and add_ssffTrackDefinition().

Extracting pre-defined tracks

To access data that are stored in files, the user has to define tracks for a database that point to sequences of samples that match a user-specified file extension. The user-defined name of such a track can then be used to reference the track in the signal data extraction process. Internally, emuR uses wrassp to read the appropriate files from disk, extract the sample sequences that match the result of a query and return values to the user for further inspection and evaluation.

# list currently available tracks
In ae, there are three tracks available, that can be read by get_trackdata(). We could e.g.

# query all "ai" phonetic segments
ai_segs = query(ae, "Phonetic == ai")
# get "fm" track data for these segments
# Note that verbose is set to FALSE
# only to avoid a progress bar
# being printed in this document.
ai_td_fm = get_trackdata(emuDBhandle = ae,
                         seglist = ai_segs,
                         ssffTrackName = "fm",
                         verbose = FALSE,
                         resultType = "tibble")
# show summary of ai_td_fm
So, we needed an emuDBhandle, a seglist, and the correct ssffTrackName to read the formant values from the fms-files. Being able to access data that is stored in files is important for two main reasons.

Firstly, it is possible to generate files using external programs such as VoiceSauce (Shue et al., 2011), which can export its calculated output to the general purpose SSFF file format. This file mechanism is also used to access data produced by EMA, EPG or any other form of signal data recordings. Secondly, it is possible to track, save and access manipulated data such as formant values that have been manually corrected. It is also worth noting that the get trackdata() function has a predefined track which is always available without it having to be defined. The name of this track is MEDIAFILE SAMPLES which references the actual samples of the audio files of the database. The next example shows how this predefined track can be used to access the audio samples belonging to the segments in ai_segs.

# get media file samples
ai_td_mfs = get_trackdata(ae,
        seglist = ai_segs,
        ssffTrackName = "MEDIAFILE_SAMPLES",
        verbose = FALSE,
        resultType = "tibble")
# plot ai_td_mfs$T1
ggplot(ai_td_mfs) + aes(y = T1, x = times_rel) + geom_line() + facet_wrap(~sl_rowIdx)

Adding new tracks

The signal processing routines provided by the wrassp package can be used to produce SSFF files containing various derived signal data (e.g., formants, fundamental frequency, etc.). The following example shows how the function add_ssffTrackDefinition() can be used to add a new track to the ae emuDB. Using the onTheFlyFunctionName parameter, the add_ssffTrackDefinition() function automatically executes the wrassp signal processing function ksvF0 (onTheFlyFunctionName= "ksvF0") and stores the results in SSFF files in the bundle directories.

# add new track and calculate
# .f0 files on-the-fly using wrassp::ksvF0()
                        name = "F0",
                        onTheFlyFunctionName = "ksvF0",
                        verbose = FALSE)
# show newly added track
# show newly added files
list_files(ae, fileExtension = "f0")
# extract newly added trackdata
ai_td = get_trackdata(ae,
                      seglist = ai_segs,
                      ssffTrackName = "F0",
                      verbose = FALSE,
                      resultType = "tibble")

In the command add_ssffTrackDefinition(), we could have also added parameter values to the wrassp function ksvF0.

We can see, that the parameter gender is by default on “u” (undefined). As we know, that the ae data-base consists of male speech only, we could set this to “m” (male) in onThe FlyParams:

                        name = "F0",
                        onTheFlyFunctionName = "ksvF0",
                        onTheFlyParams = list(gender = "m"),
                        verbose = FALSE)

Pre-calculated dervied signals can be shown and corrected in the EMU-webApp. Pre-calculated formants may be overlaid to the spectrogram. Learn more about this in chapter 08.

One disadvantage of this method may not be withholded: it is – until now – not possible to pre-calculate speaker-group-(e.g. gender)-specific data by setting different parameters for various speaker-groups. However, implementation of this feature is in the making.

Calculating tracks on-the-fly

The user is able to select one of the signal processing routines provided by wrassp and pass it on to the signal data extraction function. The signal data extraction function can then apply this wrassp function to each audio file as part of the signal data extraction process. This means that the user can quickly manipulate function parameters and evaluate the result without having to store to disk the files that would usually be generated by the various parameter experiments. In many cases this new functionality eliminates the need for defining a track definition for the entire database for temporary data analysis purposes. The following example shows how the onTheFlyFunctionName parameter of the get trackdata() function is used:

ai_td_pit = get_trackdata(ae,
seglist = ai_segs,
onTheFlyFunctionName = "mhsF0",
              verbose = FALSE,
              resultType = "tibble")
# show head of ai_td
## # A tibble: 183 x 21
##    sl_rowIdx labels start   end utts  db_uuid session bundle start_item_id
##        <int> <chr>  <dbl> <dbl> <chr> <chr>   <chr>   <chr>          <int>
##  1         1 ai      863. 1016. 0000… 0fc618… 0000    msajc…           161
##  2         1 ai      863. 1016. 0000… 0fc618… 0000    msajc…           161
##  3         1 ai      863. 1016. 0000… 0fc618… 0000    msajc…           161
##  4         1 ai      863. 1016. 0000… 0fc618… 0000    msajc…           161
##  5         1 ai      863. 1016. 0000… 0fc618… 0000    msajc…           161
##  6         1 ai      863. 1016. 0000… 0fc618… 0000    msajc…           161
##  7         1 ai      863. 1016. 0000… 0fc618… 0000    msajc…           161
##  8         1 ai      863. 1016. 0000… 0fc618… 0000    msajc…           161
##  9         1 ai      863. 1016. 0000… 0fc618… 0000    msajc…           161
## 10         1 ai      863. 1016. 0000… 0fc618… 0000    msajc…           161
## # ... with 173 more rows, and 12 more variables: end_item_id <int>,
## #   level <chr>, start_item_seq_idx <int>, end_item_seq_idx <int>,
## #   type <chr>, sample_start <int>, sample_end <int>, sample_rate <int>,
## #   times_orig <dbl>, times_rel <dbl>, times_norm <dbl>, T1 <dbl>

Reconsidering last week’s example:

# query A and V(front and back open vowels),
# i:and u: (front and back closed vowels), and
# E and o: (front and back mid vowels)
ae_vowels = query(emuDBhandle = ae,query = "[Phonetic== V|A|i:|u:|o:|E]")
#get the formants:
ae_formants = get_trackdata(ae, seglist = ae_vowels,ssffTrackName = "fm", resultType = "tibble")

Please never forget to set the resultType parameter of the get trackdata() function to tibble!

The emuRtrackdata object of type tibble is an amalgamation of both a segment list and trackdata. The first sl_rowIdx column of the ae_formants object indicates the row index of the segment list the current row belongs to,

the times_rel, times_orig and times_norm columns represent the relative time,

the original time,

and the normalized time (ranging from 0 to 1) of the samples contained in the current row

and T1 (to Tn in n dimensional trackdata) contains the actual signal sample values.

ae_formants$T1 # =first formant frequency
ae_formants$T2 # =second formant frequency
ae_formants$labels contains the phonetic symbols:

We can now plot the tracked formants, e.g. F2. as a function of time (of relative time, i.e. $times_rel). We can use plotting functions of the package ggplot2:

ggplot(ae_formants) +
  aes(x=times_rel,y=T2,col=labels) +

Oops, that didn’t work. Add group=sl_rowIdx to the aes part in order to ensure individual trajectories for each vowel token (otherwise, ggplot tries to group only by vowel type):

ggplot(ae_formants) +
  aes(x=times_rel,y=T2,col=labels,group=sl_rowIdx) +

Now, we have plotted each token with its original duration. For reasons of comparison, we could have plotted the normalied vowel durations:

ggplot(ae_formants) +
  aes(x=times_norm,y=T2,col=labels,group=sl_rowIdx) +

In order to average out the differences per vowel type, we should do:

ae_formants_norm = normalize_length(ae_formants)

We now have 21 samples per vowel token:

We can now average per vowel type and per sample:

ae_formants_norm_average = ae_formants_norm %>% 
  group_by(labels,times_norm) %>%
  summarise(F2 = mean(T2))

ggplot(ae_formants_norm_average) +
  aes(x=times_norm,y=F2,col=labels) +

Vowel spaces and ellipses

A very useful and usual plot in phonetic sciences are vowel spaces, often with the distributions of vowel tokens shown by ellipses. In order to produce such a plot, we need to get the vowels’ midpoints:

#get the formants at the vowels' temporal midpoints from ae_formants_norm:
ae_midpoints = ae_formants_norm %>% filter(times_norm==0.5)

#plot the vowel space:
ggplot(ae_midpoints) +
  aes(x=T2,y=T1,label=labels,col=labels) +

As vowel height is correlated with the first formant frequency (here: T1) and vowel position with F2 (here: T2), we chose to plot T2 on x and T1 on y. However, the plot does not yet resemble a vowel space as we know it from the text books. This is so, because, the correlations mentioned above are negative correlations - the axes on x and on y should be reversed. We can do so in ggplot2 in a very simple way.

#plot the vowel space:
ggplot(ae_midpoints) +
  aes(x=T2,y=T1,label=labels,col=labels) +
  geom_text() +
  scale_y_reverse() + scale_x_reverse()

Excellent! We can now also get rid of the legend, and replace T1 and T2 with “F1(Hz)” and “F2(Hz)”.

#plot the vowel space:
ggplot(ae_midpoints) +
  aes(x=T2,y=T1,label=labels,col=labels) +
  geom_text() +
  scale_y_reverse() + scale_x_reverse() + 
  labs(x = "F2 (Hz)", y = "F1 (Hz)") +

In order to add an ellipse, we simply have to type in + stat_ellipse():

#plot the vowel space:
ggplot(ae_midpoints) +
  aes(x=T2,y=T1,label=labels,col=labels) +
  geom_text() +
  stat_ellipse() +
  scale_y_reverse() + scale_x_reverse() + 
  labs(x = "F2 (Hz)", y = "F1 (Hz)") +
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Warning: Removed 3 rows containing missing values (geom_path).

Oops: we here read the message “Too few points to calculate an ellipse”, and therefore we don’t see any ellipses for /o:/, /A/, and /V/. We possibly better concentrate on /E/, /i:/, and /u:/:

ae_midpoints_Eiu = ae_midpoints %>% filter(labels%in%c("E","i:","u:"))
#plot the vowel space:
ggplot(ae_midpoints_Eiu) +
  aes(x=T2,y=T1,label=labels,col=labels) +
  geom_text() +
  stat_ellipse() +
  scale_y_reverse() + scale_x_reverse() + 
  labs(x = "F2 (Hz)", y = "F1 (Hz)") +

We usually do not want to see all tokens (as their distribution is shown by the ellipse), but want to see the ellipse added with one label per ellipse. This label is usually plotted at the ellipse’s midpoint, usually called the ellipse’s centroid. The ellipses midpoint, however, happens to correspond to the arithmetrical means of the first and second formant frequencies. We can precalculate these means, once again with methods from the package dplyr:

ae_centroid = ae_midpoints_Eiu %>%
  group_by(labels) %>%
  summarise(T1 = mean(T1), T2 = mean(T2))

We then simply plot the ellipses without any text:

#plot the vowel space, ellipses, but no text:
ggplot(ae_midpoints_Eiu) +
  aes(x=T2,y=T1,label=labels,col=labels) +
  stat_ellipse() +
  scale_y_reverse() + scale_x_reverse() + 
  labs(x = "F2 (Hz)", y = "F1 (Hz)") +

… and then simply add again the geom_text() part, but this time with a reference to the newly created data.frame, i.e. geom_text(data = ae_centroid)

#plot the vowel space, ellipses, but no text:
ggplot(ae_midpoints_Eiu) +
  aes(x=T2,y=T1,label=labels,col=labels) +
  stat_ellipse() +
  scale_y_reverse() + scale_x_reverse() + 
  labs(x = "F2 (Hz)", y = "F1 (Hz)") +
  theme(legend.position="none") +
  geom_text(data = ae_centroid)

However, be careful with ellipses, when only a few vowel tokens are to be plotted. As we can see, the ellipse of /u:/ extends too much to the left, and seems to be even more ‘front’ than /i:/. This is, however, clearly not the case (see the plots showing the individual tokens of /u:/), and result of the low number of /u:/ observations, and their distribution with much variation along the x-axis.


Given a database with pre-calculated formants, you need to do the following things to get a plot of the vowel space with ellipses, labelled with the vowel label at the ellipses’s centroids:


ae = load_emuDB(path2ae, verbose = F)
Eiu = query(ae,"[Phonetic == E|i:|u:]")
Eiu_fm = get_trackdata(emuDBhandle = ae,
                         seglist = Eiu,
                         ssffTrackName = "fm",
                         verbose = FALSE,
                         resultType = "tibble")
Eiu_fm_norm = normalize_length(Eiu_fm)
Eiu.05 = Eiu_fm_norm %>% filter(times_norm==0.5)
Eiu.05.centroids = Eiu.05 %>%
                    group_by(labels) %>%
                    summarise(T1 = mean(T1), T2 = mean(T2))

ggplot(Eiu.05) +
  aes(x=T2,y=T1,label=labels,col=labels) +
  stat_ellipse() +
  scale_y_reverse() + scale_x_reverse() + 
  labs(x = "F2 (Hz)", y = "F1 (Hz)") +
  theme(legend.position="none") +
  geom_text(data = Eiu.05.centroids)

