1 Preliminaries
2 Getting signal data into EmuR
- 2.1 Signal files that already exist in a database
- 2.2 Signal files not already in the database
  - 2.2.1 Storing new signal files permanently
  - 2.2.2 Storing new signal files temporarily
3 The structure of a trackdata object
4 Obtaining values at a single point in time or between time points
5 Some common plots applied to trackdata objects including linear time normalization

1 Preliminaries

Follow the setup instructions given here, i.e. download R and RStudio, create a directory on your computer where you will store files on this course, make a note of the directory path, create an R project that accesses this directory, and install all indicated packages.

For this and subsequent tutorials, access the tidyverse,magrittr, emuR, and wrassp libraries:

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(magrittr)

## 
## Attaching package: 'magrittr'
## 
## The following object is masked from 'package:purrr':
## 
##     set_names
## 
## The following object is masked from 'package:tidyr':
## 
##     extract

library(emuR)

## 
## Attaching package: 'emuR'
## 
## The following object is masked from 'package:base':
## 
##     norm

library(wrassp)

The following makes use of the demonstration database emuDB that was also used here.

Store and access the demo database as also described here and thus:

create_emuRdemoData(dir = tempdir())
path.ae = file.path(tempdir(), "emuR_demoData", "ae_emuDB")
ae = load_emuDB(path.ae)

## INFO: Loading EMU database from /var/folders/x_/x690j1dj703f09w41vm3hxd80000gp/T//Rtmp79FYCS/emuR_demoData/ae_emuDB... (7 bundles found)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |======================================================================| 100%

summary(ae)

##

## ── Summary of emuDB ────────────────────────────────────────────────────────────

## Name:     ae 
## UUID:     0fc618dc-8980-414d-8c7a-144a649ce199 
## Directory:    /private/var/folders/x_/x690j1dj703f09w41vm3hxd80000gp/T/Rtmp79FYCS/emuR_demoData/ae_emuDB 
## Session count: 1 
## Bundle count: 7 
## Annotation item count:  736 
## Label count:  844 
## Link count:  785

##

## ── Database configuration ──────────────────────────────────────────────────────

##

## ── SSFF track definitions ──

##

##  name columnName fileExtension
##  dft  dft        dft          
##  fm   fm         fms

## ── Level definitions ──

##  name         type    nrOfAttrDefs attrDefNames       
##  Utterance    ITEM    1            Utterance;         
##  Intonational ITEM    1            Intonational;      
##  Intermediate ITEM    1            Intermediate;      
##  Word         ITEM    3            Word; Accent; Text;
##  Syllable     ITEM    1            Syllable;          
##  Phoneme      ITEM    1            Phoneme;           
##  Phonetic     SEGMENT 1            Phonetic;          
##  Tone         EVENT   1            Tone;              
##  Foot         ITEM    1            Foot;

## ── Link definitions ──

##  type         superlevelName sublevelName
##  ONE_TO_MANY  Utterance      Intonational
##  ONE_TO_MANY  Intonational   Intermediate
##  ONE_TO_MANY  Intermediate   Word        
##  ONE_TO_MANY  Word           Syllable    
##  ONE_TO_MANY  Syllable       Phoneme     
##  MANY_TO_MANY Phoneme        Phonetic    
##  ONE_TO_MANY  Syllable       Tone        
##  ONE_TO_MANY  Intonational   Foot        
##  ONE_TO_MANY  Foot           Syllable

2 Getting signal data into EmuR

The procedure in all cases is first to make a segment or event list using the query() function that was discussed here and then to make use of the function get_trackdata() to obtain the signal data for that segment or event list. There are three cases to consider, depending on whether or not the signals that are to be read into R using get_trackdata() already exist or not.

2.1 Signal files that already exist in a database

As discussed in this earlier module, the signal files that exist and that are accessible in emuR are shown as follows:

list_ssffTrackDefinitions(ae)

##   name columnName fileExtension
## 1  dft        dft           dft
## 2   fm         fm           fms

The meaning of the three columns was also explained in an earlier module.

The first of these, dft contains spectral data, and the second, fm, contains data of the first four formant frequencies. The following commands make a trackdata object of the first formant frequencies between the start time and end time of all [i:] segments in the database:

# segment list of all [i:] segments
i.s = query(ae, "Phonetic = i:")
# trackdata object of the first four formant frequencies
i.fm = get_trackdata(ae, i.s, "fm")
# or
i.fm = i.s %>% get_trackdata(ae, ., "fm")

The audio waveform can also be read into R in a similar way using the argument MEDIAFILE_SAMPLES. The following imports the waveforms of the [i:] segments into R:

s.wav = get_trackdata(ae, i.s, "MEDIAFILE_SAMPLES")

Making a trackdata object for an event list works in exactly the same way:

# get all H* tones
hstar.e = query(ae, "Tone = H*")
# Formant data
hstar.fm = get_trackdata(ae, hstar.e, "fm")
# or
hstar.fm = hstar.e %>% 
  get_trackdata(ae, ., "fm")

The number of observations in hstar.fm should be exactly the same as the number of H* events:

nrow(hstar.e) == nrow(hstar.fm)

## [1] TRUE

This is necessarily so because as explained here, an event list contains annotations defined by a single point in time. For this reason, each event can only be associated with one signal value (or one set of signal values if multiparametric as in the case here of formants F1-F4). By contrast there are many more observations in a trackdata object derived from a segment list:

nrow(i.s)

## [1] 6

nrow(i.fm)

## [1] 98

This is because the trackdata object contains data at regular intervals between each segment’s start and end time. And so since segments have a certain duration, there will in almost all cases be more trackdata observations than there are segments.

2.2 Signal files not already in the database

2.2.1 Storing new signal files permanently

This has already been demonstrated when calculating pitch data in earlier modules. New signals can be added with the wrassp package. The wrassp package is a wrapper for R around Michel Scheffers’ libassp (Advanced Speech Signal Processor). The currently available signal processing functions provided by wrassp are:

Command	Meaning
`acfana()`	Analysis of short-term autocorrelation function
`afdiff()`	Computes the first difference of the signal
`affilter()`	Filters the audio signal (e.g., low-pass and high-pass)
`cepstrum()`	Short-term cepstral analysis
`cssSpectrum()`	Cepstral smoothed version of `dftSpectrum()`
`dftSpectrum()`	Short-term DFT spectral analysis
`forest()`	Formant estimation
`ksvF0()`	f0 analysis of the signal
`lpsSpectrum()`	Linear predictive smoothed version of `dftSpectrum()`
`mhsF0()`	Pitch analysis of the speech signal using Michel Scheffers’ Modified Harmonic Sieve algorithm
`rfcana()`	Linear prediction analysis
`rmsana()`	Analysis of short-term Root Mean Square amplitude
`zcrana()`	Analysis of the averages of the short-term positive and negative zero-crossing rates

The fastest way to add new signals to the database is with the function add_files(). Two new signals are added in the example below. One is RMS-energy for estimating a signal’s intensity. The other is the zero-crossing rate (ZCR) which is the number of times the waveform crosses the time axis expressed in Hz. ZCR typically follows the frequency where most energy is concentrated (and indeed the first spectral moment): it is therefore typically higher for fricatives than for sonorants and higher for [s] than for [ʃ]. These signals are added with the default parameters in the following example:

add_ssffTrackDefinition(ae,
                        "energy",
                        onTheFlyFunctionName = "rmsana")
add_ssffTrackDefinition(ae,
                        "zero_cross",
                        onTheFlyFunctionName = "zcrana")

The following shows that these signals have been added to the ae database:

list_ssffTrackDefinitions(ae)

##         name columnName fileExtension
## 1        dft        dft           dft
## 2         fm         fm           fms
## 3     energy        rms           rms
## 4 zero_cross        zcr           zcr

The corresponding signal data for these newly added signals can now be obtained in exactly the same way as before:

# zero-crossing frequency for the segment list 
i.zcr = get_trackdata(ae, i.s, "zero_cross")
# or
i.zcr = i.s %>% 
  get_trackdata(ae, ., "zero_cross")

The physical location of these signal files is within the corresponding bundles. Recall that all the files associated with any utterance are stored in the same bundle. The bundles for ae are here:

list_bundles(ae)

## # A tibble: 7 × 2
##   session name    
##   <chr>   <chr>   
## 1 0000    msajc003
## 2 0000    msajc010
## 3 0000    msajc012
## 4 0000    msajc015
## 5 0000    msajc022
## 6 0000    msajc023
## 7 0000    msajc057

The signal files for the first of these utterances are shown here:

list.files(file.path(path.ae, "0000_ses", "msajc003_bndl"))

## [1] "msajc003_annot.json" "msajc003.dft"        "msajc003.fms"       
## [4] "msajc003.rms"        "msajc003.wav"        "msajc003.zcr"

and they are located at:

file.path(path.ae, "0000_ses", "msajc003_bndl")

## [1] "/var/folders/x_/x690j1dj703f09w41vm3hxd80000gp/T//Rtmp79FYCS/emuR_demoData/ae_emuDB/0000_ses/msajc003_bndl"

Any of the wrassp functions can be run with different parameter settings. The parameter settings can be seen using the formals() function with the wrassp signal processing routine as a single argument. Thus, to see the parameters associated with mhsF0, one of the functions for calculating the fundamental frequency:

formals(mhsF0)

## $listOfFiles
## NULL
## 
## $optLogFilePath
## NULL
## 
## $beginTime
## [1] 0
## 
## $centerTime
## [1] FALSE
## 
## $endTime
## [1] 0
## 
## $windowShift
## [1] 5
## 
## $gender
## [1] "u"
## 
## $maxF
## [1] 600
## 
## $minF
## [1] 50
## 
## $minAmp
## [1] 50
## 
## $minAC1
## [1] 0.25
## 
## $minRMS
## [1] 18
## 
## $maxZCR
## [1] 3000
## 
## $minProb
## [1] 0.52
## 
## $plainSpectrum
## [1] FALSE
## 
## $toFile
## [1] TRUE
## 
## $explicitExt
## NULL
## 
## $outputDirectory
## NULL
## 
## $forceToLog
## useWrasspLogger
## 
## $verbose
## [1] TRUE

One of the parameters $gender can be specified as the default u or m (for male speakers) or f (for female speakers). In order to add pitch data to the ae database with the parameter set to male:

add_ssffTrackDefinition(ae,
                        "pitch",
                        onTheFlyFunctionName = "mhsF0",
                        onTheFlyParams = list(gender = "m"))

2.2.2 Storing new signal files temporarily

The commands from the preceding sections have been used to store signals permanently in the database. However, it is possible to obtain signal data without storing it as part of the database for a segment or event list using get_trackdata() with the argument onTheFlyFunctionName. Thus even though no pitch data has been calculated and stored using the function ksvF0() (as list_ssffTrackDefinitions(ae) will show), it can still be obtained e.g. for the earlier segment or event lists as follows:

i.ksv = get_trackdata(ae, 
                      i.s, 
                      onTheFlyFunctionName = "ksvF0")
hstar.ksv = get_trackdata(ae, 
                          hstar.e, 
                          onTheFlyFunctionName = "ksvF0")
# or
hstar.ksv = hstar.e %>%
  get_trackdata(ae, ., onTheFlyFunctionName = "ksvF0")

The parameters can once again be specified with the additional argument onTheFlyParams. Thus to repeat the above but with gender set to m:

i.ksv = get_trackdata(ae, 
                      i.s, 
                      onTheFlyFunctionName = "ksvF0",
                      onTheFlyParams = list(gender = "m"))

3 The structure of a trackdata object

A trackdata object is of the type tibble with the descriptors shown below. Most of these columns (commented with same) have the same information as in the segment or event list from which the trackdata object was derived. Those that are different are highlighted in bold:

sl_rowIdx: a numerical vector for identifying the signals belonging to the nth row of the segment (or event) list.
labels: annotations or sequenced annotations of segments concatenated by -> (same)
start: onset time in milliseconds (same)
end: offset time in milliseconds (same)
db_uuid: UUID of emuDB (= a unique identifier) (same)
session: session name (same)
bundle: bundle name (= utterance name) (same)
start_item_id: item ID of first element of sequence (same)
end_item_id: item ID of last element of sequence (same)
level: name of the tier that has been searched (same)
attribute: name of attribute that has been searched (same)
start_item_seq_idx: sequence index of start item (same)
end_item_seq_idx: sequence index of end item (same)
type: type of “segment” row: ITEM: symbolic item, EVENT: event item, SEGMENT: segment (same)
sample_start: start sample position (same)
sample_end: end sample position (same)
sample_rate: sample rate (same)
times_orig: the times at which the successive frames (per segment) of trackdata occur
times_rel: as times_orig but with the start time of the first frame (per segment) set to zero
times_norm: normalised time such that start time (per segment) is zero and the end time is 1
T1: the signal values. If there are multiple signals, then T2, T3… Tn (thus T1:T4 when extracting the first four formant frequencies)

The column names that are different from those of the segment/event list are:

sl_rowIdx which allows identification of the signals that belong to segment number n in the segment list. Thus, for the above example, the part of the trackdata object corresponding to i.s[3,] i.e. the third segment of the segment list i.s is:

i.fm %>% filter(sl_rowIdx == 3)

## # A tibble: 24 × 24
##    sl_rowIdx labels start   end db_uuid session bundle start_item_id end_item_id
##        <int> <chr>  <dbl> <dbl> <chr>   <chr>   <chr>          <int>       <int>
##  1         3 i:     2569. 2692. 0fc618… 0000    msajc…           186         186
##  2         3 i:     2569. 2692. 0fc618… 0000    msajc…           186         186
##  3         3 i:     2569. 2692. 0fc618… 0000    msajc…           186         186
##  4         3 i:     2569. 2692. 0fc618… 0000    msajc…           186         186
##  5         3 i:     2569. 2692. 0fc618… 0000    msajc…           186         186
##  6         3 i:     2569. 2692. 0fc618… 0000    msajc…           186         186
##  7         3 i:     2569. 2692. 0fc618… 0000    msajc…           186         186
##  8         3 i:     2569. 2692. 0fc618… 0000    msajc…           186         186
##  9         3 i:     2569. 2692. 0fc618… 0000    msajc…           186         186
## 10         3 i:     2569. 2692. 0fc618… 0000    msajc…           186         186
## # ℹ 14 more rows
## # ℹ 15 more variables: level <chr>, attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>, times_orig <dbl>, times_rel <dbl>, times_norm <dbl>,
## #   T1 <int>, T2 <int>, T3 <int>, T4 <int>

The number of segments in (i) the segment list and in (ii) the trackdata object derived from (i) is always the same. This can be verified by:

nrow(i.s) == length(unique(i.fm$sl_rowIdx))

## [1] TRUE

# is the number of rows in the
# segment list equal to
i.s %>% nrow() == 
  # the unique segment identifiers
  # of the corersponding trackdata object?
  i.fm %>%
  select(sl_rowIdx) %>%
  n_distinct()

## [1] TRUE

The formant data (of all four formants) for the 3rd segment is therefore given by:

f_3 = i.fm %>% 
  filter(sl_rowIdx == 3) %>% 
  select(T1:T4)
f_3

## # A tibble: 24 × 4
##       T1    T2    T3    T4
##    <int> <int> <int> <int>
##  1   339  1307  2312  3685
##  2   341  1367  2288  3656
##  3   347  1425  2299  3655
##  4   344  1454  2293  3676
##  5   339  1515  2295  3679
##  6   364  1555  2300  3665
##  7   349  1673  2320  3618
##  8   350  1754  2337  3602
##  9   334  1773  2346  3601
## 10   311  1825  2383  3508
## # ℹ 14 more rows

For this third segment, there are 24 frames of data:

nrow(f_3)

## [1] 24

A frame of data is the signal (or signals) that occur at a particular point of time within the segment. The frames of data extend at equal intervals (known as the frame rate – see below) between the start time and end time of a segment. Thus, the 24 frames of data for this third segment of i.s extend at equal intervals between:

i.s %>% 
  slice(3) %>% 
  pull(start)

## [1] 2569.225

# or equivalently
i.s$start[3]

## [1] 2569.225

and

i.s %>% 
  slice(3) %>% 
  pull(end)

## [1] 2692.325

# or equivalently
i.s$end[3]

## [1] 2692.325

The actual times at which these formants for the third segment occur is given by:

times_orig_3 = i.fm %>% 
  filter(sl_rowIdx == 3) %>% 
  pull(times_orig)
times_orig_3

##  [1] 2572.5 2577.5 2582.5 2587.5 2592.5 2597.5 2602.5 2607.5 2612.5 2617.5
## [11] 2622.5 2627.5 2632.5 2637.5 2642.5 2647.5 2652.5 2657.5 2662.5 2667.5
## [21] 2672.5 2677.5 2682.5 2687.5

Notice how the time of the first frame

times_orig_3[1]

## [1] 2572.5

is a fraction greater than the left boundary time of the 3rd segment given by (i.s$start[3] above); and the time of the last frame:

tail(times_orig_3, 1)

## [1] 2687.5

is a fraction less than the right boundary time of the 3rd segment given by (i.s$end[3] above). The frame rate is the interval between frames which for these data is 5 ms. This is shown by the difference between successive times of the frames of data:

 diff(times_orig_3)

##  [1] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

times_rel resets the original times such that the first frame has a start time of zero. Thus for the 3rd segment times_rel is

times_rel_3 = i.fm %>% 
  filter(sl_rowIdx == 3) %>% 
  pull(times_rel)
times_rel_3

##  [1]   0   5  10  15  20  25  30  35  40  45  50  55  60  65  70  75  80  85  90
## [20]  95 100 105 110 115

which is the same as the original times subtracted from the time of the first frame of data:

times_orig_3 - times_orig_3[1]

##  [1]   0   5  10  15  20  25  30  35  40  45  50  55  60  65  70  75  80  85  90
## [20]  95 100 105 110 115

and which also shows that the frame rate is 5 ms. The frame rate is incidentally the same for all observations of the trackdata object (i.e. for all segments from which trackdata is obtained).

times_norm is a form of linear time normalisation: it resets the times in times_rel so that the time of the first frame of data remains at zero but the time of the last frame of data is 1. The normalised times are then at equal intervals between 0 and 1. For this third segment, they are:

times_norm_3 = i.fm %>% 
  filter(sl_rowIdx == 3) %>% 
  pull(times_norm)
times_norm_3

##  [1] 0.00000000 0.04347826 0.08695652 0.13043478 0.17391304 0.21739130
##  [7] 0.26086957 0.30434783 0.34782609 0.39130435 0.43478261 0.47826087
## [13] 0.52173913 0.56521739 0.60869565 0.65217391 0.69565217 0.73913043
## [19] 0.78260870 0.82608696 0.86956522 0.91304348 0.95652174 1.00000000

The normalised times are times_rel divided by the total duration of all frames of data (i.e. by the difference in time between the first and last frame of data). The total duration of the frames of data is just the last value of times_rel. So for this third segment, the duration of the 24 frames of data is 115 ms:

tail(times_rel_3, 1)

## [1] 115

Thus the normalised times for the third segment are also given by:

times_rel_3/115

##  [1] 0.00000000 0.04347826 0.08695652 0.13043478 0.17391304 0.21739130
##  [7] 0.26086957 0.30434783 0.34782609 0.39130435 0.43478261 0.47826087
## [13] 0.52173913 0.56521739 0.60869565 0.65217391 0.69565217 0.73913043
## [19] 0.78260870 0.82608696 0.86956522 0.91304348 0.95652174 1.00000000

which is the same as times_norm_3 obtained above. Time-normalised values can be helpful when comparing the shape of trajectories independently of whether these shapes are of different duration (e.g. comparing the rise and fall of F2 for vowels of different duration).

4 Obtaining values at a single point in time or between time points

The issue here is how to obtain signal data for each segment at a particular proportion of the segment’s duration – for example, at the segment’s temporal midpoint. One way is to use the cut argument to the function get_trackdata(). For example, the following obtains the formant values for the segment list i.s at the temporal midpoint:

i.fm5 = get_trackdata(ae, i.s, "fm", cut = .5)

In this case, there is one observation per segment in the trackdata object i.fm5 (the formant data at the temporal midpoint). For this reason, the number of observations (rows) in i.fm5 is the same as the number of rows in the segment list i.s from which the trackdata object was derived:

nrow(i.s) == nrow(i.fm5)

## [1] TRUE

Another way is to first create a trackdata object for all time points, then round the normalised times to values 0, 0.1, … 1, then aggregate at each of these normalised time values, and then finally extract the formant data at e.g. time point 0.5. The following makes uses of dplyr commands in order to identify F1 and F2 at (aggregated) normalised time point 0.5 in i.fm.

# make a data-frame i.fm5b
i.fm5b = i.fm %>%
  # create a column times_norm2 that is times_norm
  # rounded to one decimal place
  mutate(times_norm2 = round(times_norm,1)) %>%
  # for each unique element in times_norm2
  # and in sl_rowIdx
  group_by(times_norm2, sl_rowIdx) %>%
  # calculate the F1-mean and F2-mean
  summarise(T1 = mean(T1), T2 = mean(T2)) %>%
  # extract these T1 and T2 values at 
  # aggregated normalised time point 0.5
  filter(times_norm2 == .5) %>%
  # it's good practice to ungroup after using group_by()
  ungroup()

## `summarise()` has grouped output by 'times_norm2'. You can override using the
## `.groups` argument.

i.fm5b

## # A tibble: 6 × 4
##   times_norm2 sl_rowIdx    T1    T2
##         <dbl>     <int> <dbl> <dbl>
## 1         0.5         1   287 1727 
## 2         0.5         2   298 2273 
## 3         0.5         3   325 1902.
## 4         0.5         4   245 2265 
## 5         0.5         5   320 1910.
## 6         0.5         6   320 1835

As the above data-frame shows, there are 6 rows (one row per segment in i.s) with F1 and F2 data (columns T1 and T2) at the temporal midpoint of each segment. (Another way of obtaining frames of data at a particular time point is to extract them at time point 0.5 after applying the function normalize_length() as explained in the next section).

Extracting data between two (proportional) time points can be straightforwardly accomplished, once the trackdata object has been derived. E.g. to create a trackdata object of the middle third of the segment duration:

i.fm_middle = i.fm %>%
  filter(times_norm >= 1/3 & times_norm <= 2/3)
range(i.fm_middle$times_norm)

## [1] 0.3478261 0.6521739

nrow(i.fm_middle)

## [1] 32

As the above shows, i.fm_middle has around 1/3 of the observations of i.fm (98 observations) and consists of observations with normalised times greater than 0.34 and less than 0.66.

5 Some common plots applied to trackdata objects including linear time normalization

Now that formant data and different types of time axes have been obtained, it should be possible to plot the formant(s) as a function of time for this third segment. Thus for F2 as a function of relative time for this third segment:

plot(times_rel_3, f_3$T2)

But a much better option is to make use of ggplot2 for the same purpose applied to the trackdata object.

# choose data from the 3rd segment
i.fm %>% 
  filter(sl_rowIdx == 3) %>%
  # send to ggplot
  ggplot() +
  # plot T2 (F2) on the y-axis, times-rel
  # on the x-axis
  aes(y = T2, x = times_rel) +
  # plot points - use geom_line() for a line
  geom_point() +
  # add some axis-titles
  xlab("Time (ms)") +
  ylab("F2 (Hz)")

To make an F2 plot as a function of normalised time for all the segments in the segment list i.s requires grouping by segment identifier using the group argument to aes():

i.fm %>%
  ggplot() +
  aes(y = T2, x = times_norm, group = sl_rowIdx) +
  geom_line() +
  xlab("Proportional time") +
  ylab("F2 (Hz)")

Colour coding can be used to distinguish between different annotation types. The following for example plots F2 for all [ei, ai] diphthongs in the database.

dip.fm = 
# Make a segment list
query(ae, "Phonetic = ei | ai") %>%
  # get the formants
get_trackdata(ae, ., "fm") 

# Plot F2 vs. normalised time
dip.fm  %>%
  # plot F2 vs. normalised time
  ggplot() +
  aes(y = T2, 
      x = times_norm, 
      col = labels, 
      group = sl_rowIdx) +
  geom_line() +
  xlab("Proportional time") +
  ylab("F2 (Hz)")

Notwithstanding the obvious formant tracking error in one of the [ei] segments, a common type of plot is one in which an aggregate is made per annotation type. This will, however, require there to be an equal number of normalised time points per segment. The function normalise_length() can be applied for this purpose. The following makes a new trackdata object such that each segment has 11 equally spaced normalised time values between 0 and 1 i.e. at time values:

dip.fm.n = normalize_length(dip.fm, N = 11)

At the core of the EmuR normalize_length() function is the R function approx. An equivalent result can be given with:

  N = 11
dip.fm.n2 = dip.fm %>%
  # for each segment
  group_by(sl_rowIdx)  %>%
  reframe(
    # normalize the F1 values
    T1 = approx(times_rel, T1, n = N)$y,
    # normalise the F2 values
    T2 = approx(times_rel, T2, n = N)$y,
    # give the times for each segment
    times = seq(0, 1, length.out = N)) %>%
  ungroup()

Verify that this is the same with e.g.

dip.fm.n2 %>% pull(T2) -
  dip.fm.n %>% pull(T2)

##  [1]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
##  [6]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
## [11]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
## [16]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
## [21]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
## [26]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
## [31] -2.273737e-13  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
## [36]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
## [41]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
## [46]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
## [51]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
## [56]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
## [61]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  1.136868e-13
## [66]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
## [71]  0.000000e+00  0.000000e+00 -2.273737e-12  0.000000e+00  0.000000e+00
## [76]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
## [81]  0.000000e+00  0.000000e+00  0.000000e+00 -2.273737e-13  0.000000e+00
## [86]  0.000000e+00 -2.273737e-13  0.000000e+00  0.000000e+00  0.000000e+00
## [91]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
## [96]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00

Note that normalize_length() preserves all the original columns of the original trackdata object, whereas the above code only returns (in this case) normalized F1, F2 and the corresponding normalized times and the segment identifier (`$sl_rowIdx).

The following verifies that there are an equal number of such time points for each of the 9 segments:

dip.fm.n %>% 
  select(sl_rowIdx, times_norm) %>% 
  table()

##          times_norm
## sl_rowIdx 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
##         1 1   1   1   1   1   1   1   1   1   1 1
##         2 1   1   1   1   1   1   1   1   1   1 1
##         3 1   1   1   1   1   1   1   1   1   1 1
##         4 1   1   1   1   1   1   1   1   1   1 1
##         5 1   1   1   1   1   1   1   1   1   1 1
##         6 1   1   1   1   1   1   1   1   1   1 1
##         7 1   1   1   1   1   1   1   1   1   1 1
##         8 1   1   1   1   1   1   1   1   1   1 1
##         9 1   1   1   1   1   1   1   1   1   1 1

So now an F2 aggregate as a function of normalised time can be calculated for each of the two annotation types and plotted:

dip.fm.n %>%
  # for each label and for each 
  # normalised time point
  group_by(labels, times_norm) %>%
  # calculate the F2 mean
  summarise(T2 = mean(T2)) %>%
  ungroup() %>%
  # plot
  ggplot() + 
  # F2 vs normalised time
  aes(y = T2, 
      x = times_norm, 
      # colour code by annotation
      col = labels, 
      # and grouped by annotation
      group = labels) +
  geom_line() + 
  xlab("Proportional time") +
  ylab("F2 (Hz)")

## `summarise()` has grouped output by 'labels'. You can override using the
## `.groups` argument.

Αn introduction to signal processing in EmuR

Jonathan Harrington

WiSe 2023