--- title: "Αn introduction to signal processing in EmuR" author: "Jonathan Harrington" date: "WiSe 2122" output: bookdown::html_document2: number_sections: TRUE toc: true theme: flatly highlight: pygments --- # Preliminaries Follow the setup instructions given [here](https://www.phonetik.uni-muenchen.de/~jmh/lehre/basic_r/_book/index.html), i.e. download R and RStudio, create a directory on your computer where you will store files on this course, make a note of the directory path, create an R project that accesses this directory, and install all indicated packages. For this and subsequent tutorials, access the `tidyverse`,`magrittr`, `emuR`, and `wrassp` libraries: ```{r} library(tidyverse) library(magrittr) library(emuR) library(wrassp) ``` The following makes use of the demonstration database `emuDB` that was also used [here](https://www.phonetik.uni-muenchen.de/~jmh/lehre/sem/ws2122/Emuintro/00_emu_R.html#accessing-an-existing-emu-speech-database). Store and access the demo database as also described [here](https://www.phonetik.uni-muenchen.de/~jmh/lehre/sem/ws2122/Emuintro/00_emu_R.html) and thus: ```{r} create_emuRdemoData(dir = tempdir()) path.ae = file.path(tempdir(), "emuR_demoData", "ae_emuDB") ae = load_emuDB(path.ae) summary(ae) ``` # Getting signal data into EmuR The procedure in all cases is first to make a segment or event list using the `query()` function that was [discussed here](https://www.phonetik.uni-muenchen.de/~jmh/lehre/sem/ws2122/Emuintro/05_querylanguage.html) and then to make use of the function `get_trackdata()` to obtain the signal data for that segment or event list. There are three cases to consider, depending on whether or not the signals that are to be read into R using `get_trackdata()` already exist or not. ## Signal files that already exist in a database As discussed [in this earlier module](https://www.phonetik.uni-muenchen.de/~jmh/lehre/sem/ws2122/Emuintro/00_emu_R.html#accessing-an-existing-emu-speech-database), the signal files that exist and that are accessible in `emuR` are shown as follows: ```{r} list_ssffTrackDefinitions(ae) ``` The meaning of the three columns was also explained in an [earlier module](https://www.phonetik.uni-muenchen.de/~jmh/lehre/sem/ws2122/Emuintro/02_praat.html#adding-the-calculated-pitch-files-to-the-database). The first of these, `dft` contains spectral data, and the second, `fm`, contains data of the first four formant frequencies. The following commands make a trackdata object of the first formant frequencies between the start time and end time of all `[i:]` segments in the database: ```{r} # segment list of all [i:] segments i.s = query(ae, "Phonetic = i:") # trackdata object of the first four formant frequencies i.fm = get_trackdata(ae, i.s, "fm") ``` The audio waveform can also be read into R in a similar way using the argument `MEDIAFILE_SAMPLES`. The following imports the waveforms of the `[i:]` segments into R: ```{r} s.wav = get_trackdata(ae, i.s, "MEDIAFILE_SAMPLES") ``` Making a trackdata object for an event list works in exactly the same way: ```{r} # get all H* tones hstar.e = query(ae, "Tone = H*") # Formant data hstar.fm = get_trackdata(ae, hstar.e, "fm") ``` The number of observations in `hstar.fm` should be exactly the same as the number of `H*` events: ```{r} nrow(hstar.e) == nrow(hstar.fm) ``` This is necessarily so because [as explained here](https://www.phonetik.uni-muenchen.de/~jmh/lehre/sem/ws2122/Emuintro/00_emu_R.html#some-defining-properties-of-an-emu-database), an event list contains annotations defined *by a single point in time*. For this reason, each event can only be associated with one signal value (or one set of signal values if multiparametric as in the case here of formants F1-F4). By contrast there are many more observations in a trackdata object derived from a segment list: ```{r} nrow(i.s) nrow(i.fm) ``` This is because the trackdata object contains data at regular intervals between each segment's start and end time. And so since segments have a certain duration, there will in almost all cases be more trackdata observations than there are segments. ## Signal files not already in the database ### Storing new signal files permanently This [has already been demonstrated](https://www.phonetik.uni-muenchen.de/~jmh/lehre/sem/ws2122/Emuintro/02_praat.html#calculating-pitch-with-wrassp) when calculating pitch data in earlier modules. New signals can be added with the `wrassp` package. The `wrassp` package is a **wrapper** for R around Michel Scheffers' *libassp* (**Advanced Speech Signal Processor**). The currently available signal processing functions provided by `wrassp` are: Command | Meaning :------------------ | :-------------------------------------------------------------------------------------- `acfana()`| Analysis of short-term autocorrelation function `afdiff()`| Computes the first difference of the signal `affilter()`| Filters the audio signal (e.g., low-pass and high-pass) `cepstrum()`| Short-term cepstral analysis `cssSpectrum()`| Cepstral smoothed version of `dftSpectrum()` `dftSpectrum()`| Short-term DFT spectral analysis `forest()`| Formant estimation `ksvF0()`| f0 analysis of the signal `lpsSpectrum()`| Linear predictive smoothed version of `dftSpectrum()` `mhsF0()`| Pitch analysis of the speech signal using Michel Scheffers' Modified Harmonic Sieve algorithm `rfcana()`| Linear prediction analysis `rmsana()`| Analysis of short-term Root Mean Square amplitude `zcrana()`| Analysis of the averages of the short-term positive and negative zero-crossing rates The fastest way to add new signals to the database is with the function `add_files()`. Two new signals are added in the example below. One is RMS-energy for estimating a signal's intensity. The other is the zero-crossing rate (ZCR) which is the number of times the waveform crosses the time axis expressed in Hz. ZCR typically follows the frequency where most energy is concentrated (and indeed the first spectral moment): it is therefore typically higher for fricatives than for sonorants and higher for [s] than for [ʃ]. These signals are added with the default parameters in the following example: ```{r} add_ssffTrackDefinition(ae, "energy", onTheFlyFunctionName = "rmsana") add_ssffTrackDefinition(ae, "zero_cross", onTheFlyFunctionName = "zcrana") ``` The following shows that these signals have been added to the `ae` database: ```{r} list_ssffTrackDefinitions(ae) ``` The corresponding signal data for these newly added signals can now be obtained in exactly the same way as before: ```{r} # zero-crossing frequency for the segment list i.zcr = get_trackdata(ae, i.s, "zero_cross") ```

The physical location of these signal files is within the corresponding bundles. Recall that all the files associated with any utterance are stored in the same bundle. The bundles for `ae` are here: ```{r} list_bundles(ae) ``` The signal files for the first of these utterances are shown here: ```{r} list.files(file.path(path.ae, "0000_ses", "msajc003_bndl")) ``` and they are located at: ```{r} file.path(path.ae, "0000_ses", "msajc003_bndl") ```

Any of the `wrassp` functions can be run with different parameter settings. The parameter settings can be seen using the `formals()` function with the `wrassp` signal processing routine as a single argument. Thus, to see the parameters associated with `mhsF0`, one of the functions for calculating the fundamental frequency: ```{r} formals(mhsF0) ``` One of the parameters `$gender` can be specified as the default `u` or `m` (for male speakers) or `f` (for female speakers). In order to add pitch data to the `ae` database with the parameter set to `m`ale: ```{r} add_ssffTrackDefinition(ae, "pitch", onTheFlyFunctionName = "mhsF0", onTheFlyParams = list(gender = "m")) ``` ### Storing new signal files temporarily The commands from the preceding sections have been used to store signals permanently in the database. However, it is possible to obtain signal data without storing it as part of the database for a segment or event list using `get_trackdata()` with the argument `onTheFlyFunctionName`. Thus even though no pitch data has been calculated and stored using the function `ksvF0()` (as `list_ssffTrackDefinitions(ae)` will show), it can still be obtained e.g. for the earlier segment or event lists as follows: ```{r} i.ksv = get_trackdata(ae, i.s, onTheFlyFunctionName = "ksvF0") hstar.ksv = get_trackdata(ae, hstar.e, onTheFlyFunctionName = "ksvF0") ``` The parameters can once again be specified with the additional argument `onTheFlyParams`. Thus to repeat the above but with gender set to `m`: ```{r} i.ksv = get_trackdata(ae, i.s, onTheFlyFunctionName = "ksvF0", onTheFlyParams = list(gender = "m")) ``` # The structure of a trackdata object A trackdata object is of the type tibble with the descriptors shown below. Most of these columns (commented with `same`) have [the same information as in the segment or event list](https://www.phonetik.uni-muenchen.de/~jmh/lehre/sem/ws2122/Emuintro/05_querylanguage.html#results-of-query-segment-lists) from which the trackdata object was derived. Those that are different are highlighted in bold: + **sl_rowIdx**: a numerical vector for identifying the signals belonging to the `n`th row of the segment (or event) list. + *labels*: annotations or sequenced annotations of segments concatenated by `->` (same) + *start*: onset time in milliseconds (same) + *end*: offset time in milliseconds (same) + *db_uuid*: UUID of emuDB (= a unique identifier) (same) + *session*: session name (same) + *bundle*: bundle name (= utterance name) (same) + *start_item_id*: item ID of first element of sequence (same) + *end_item_id*: item ID of last element of sequence (same) + *level*: name of the tier that has been searched (same) + *attribute*: name of attribute that has been searched (same) + *start_item_seq_idx*: sequence index of start item (same) + *end_item_seq_idx*: sequence index of end item (same) + *type*: type of "segment" row: `ITEM`: symbolic item, `EVENT`: event item, `SEGMENT`: segment (same) + *sample_start*: start sample position (same) + *sample_end*: end sample position (same) + *sample_rate*: sample rate (same) + **times_orig**: the times at which the successive frames (per segment) of trackdata occur + **times_rel**: as *times_orig* but with the start time of the first frame (per segment) set to zero + **times_norm**: normalised time such that start time (per segment) is zero and the end time is 1 + **T1**: the signal values. If there are multiple signals, then *T2*, *T3*... *Tn* (thus *T1*:*T4* when extracting the first four formant frequencies) The column names that are different from those of the segment/event list are: `sl_rowIdx` which allows identification of the signals that belong to segment number `n` in the segment list. Thus, for the above example, the part of the trackdata object corresponding to `i.s[3,]` i.e. the third segment of the segment list `i.s` is: ```{r} i.fm %>% filter(sl_rowIdx == 3) ```

The number of segments in (i) the segment list and in (ii) the trackdata object derived from (i) is always the same. This can be verified by: ```{r} nrow(i.s) == length(unique(i.fm$sl_rowIdx)) ``` or ```{r} # is the number of rows in the # segment list equal to i.s %>% nrow() == # the unique segment identifiers # of the corersponding trackdata object? i.fm %>% select(sl_rowIdx) %>% n_distinct() ```

The formant data (of all four formants) for the 3rd segment is therefore given by: ```{r} f_3 = i.fm %>% filter(sl_rowIdx == 3) %>% select(T1:T4) f_3 ``` For this third segment, there are 24 **frames of data**: ```{r} nrow(f_3) ``` A **frame of data** is the signal (or signals) that occur **at a particular point of time** within the segment. The frames of data extend at equal intervals (known as **the frame rate** -- see below) between the start time and end time of a segment. Thus, the 24 frames of data for this third segment of `i.s` extend at equal intervals between: ```{r} i.s %>% slice(3) %>% pull(start) # or equivalently i.s$start[3] ``` and ```{r} i.s %>% slice(3) %>% pull(end) # or equivalently i.s$end[3] ``` The **actual times** at which these formants for the third segment occur is given by: ```{r} times_orig_3 = i.fm %>% filter(sl_rowIdx == 3) %>% pull(times_orig) times_orig_3 ``` Notice how the time of the first frame ```{r} times_orig_3[1] ``` is a fraction greater than the left boundary time of the 3rd segment given by (`i.s$start[3]` above); and the time of the last frame: ```{r} tail(times_orig_3, 1) ``` is a fraction less than the right boundary time of the 3rd segment given by (`i.s$end[3]` above). The **frame rate** is the interval between frames which for these data is 5 ms. This is shown by the difference between successive times of the frames of data: ```{r} diff(times_orig_3) ``` `times_rel` resets the original times such that the first frame has a start time of zero. Thus for the 3rd segment `times_rel` is ```{r} times_rel_3 = i.fm %>% filter(sl_rowIdx == 3) %>% pull(times_rel) times_rel_3 ``` which is the same as the original times subtracted from the time of the first frame of data: ```{r} times_orig_3 - times_orig_3[1] ``` and which also shows that the frame rate is 5 ms. The frame rate is incidentally the same for all observations of the trackdata object (i.e. for all segments from which trackdata is obtained). `times_norm` is a form of **linear time normalisation**: it resets the times in `times_rel` so that the time of the first frame of data remains at zero but the time of the last frame of data is 1. The normalised times are then at equal intervals between 0 and 1. For this third segment, they are: ```{r} times_norm_3 = i.fm %>% filter(sl_rowIdx == 3) %>% pull(times_norm) times_norm_3 ``` The normalised times are `times_rel` divided by the total duration of all frames of data (i.e. by the difference in time between the first and last frame of data). The total duration of the frames of data is just the last value of `times_rel`. So for this third segment, the duration of the 24 frames of data is 115 ms: ```{r} tail(times_rel_3, 1) ``` Thus the normalised times for the third segment are also given by: ```{r} times_rel_3/115 ``` which is the same as `times_norm_3` obtained above. Time-normalised values can be helpful when comparing the shape of trajectories independently of whether these shapes are of different duration (e.g. comparing the rise and fall of `F2` for vowels of different duration). # Obtaining values at a single point in time or between time points The issue here is how to obtain signal data for each segment at a particular proportion of the segment's duration -- for example, at the segment's temporal midpoint. One way is to use the `cut` argument to the function `get_trackdata()`. For example, the following obtains the formant values for the segment list `i.s` at the temporal midpoint: ```{r} i.fm5 = get_trackdata(ae, i.s, "fm", cut = .5) ``` In this case, there is one observation per segment in the trackdata object `i.fm5` (the formant data at the temporal midpoint). For this reason, the number of observations (rows) in `i.fm5` is the same as the number of rows in the segment list `i.s` from which the trackdata object was derived: ```{r} nrow(i.s) == nrow(i.fm5) ``` Another way is to first create a trackdata object for all time points, then round the normalised times to values 0, 0.1, ... 1, then aggregate at each of these normalised time values, and then finally extract the formant data at e.g. time point 0.5. The following makes uses of `dplyr` commands in order to identify F1 and F2 at (aggregated) normalised time point 0.5 in `i.fm`. ```{r} # make a data-frame i.fm5b i.fm5b = i.fm %>% # create a column times_norm2 that is times_norm # rounded to one decimal place mutate(times_norm2 = round(times_norm,1)) %>% # for each unique element in times_norm2 # and in sl_rowIdx group_by(times_norm2, sl_rowIdx) %>% # calculate the F1-mean and F2-mean summarise(T1 = mean(T1), T2 = mean(T2)) %>% # extract these T1 and T2 values at # aggregated normalised time point 0.5 filter(times_norm2 == .5) %>% # it's good practice to ungroup after using group_by() ungroup() i.fm5b ``` As the above data-frame shows, there are 6 rows (one row per segment in `i.s`) with F1 and F2 data (columns T1 and T2) at the temporal midpoint of each segment. (Another way of obtaining frames of data at a particular time point is to extract them at time point 0.5 after applying the function `normalize_length()` as explained in the next section). Extracting data between two (proportional) time points can be straightforwardly accomplished, once the trackdata object has been derived. E.g. to create a trackdata object of the middle third of the segment duration: ```{r} i.fm_middle = i.fm %>% filter(times_norm >= 1/3 & times_norm <= 2/3) range(i.fm_middle$times_norm) nrow(i.fm_middle) ``` As the above shows, `i.fm_middle` has around 1/3 of the observations of `i.fm` (98 observations) and consists of observations with normalised times greater than 0.34 and less than 0.66. # An introduction to a few common plots applied to trackdata objects Now that formant data and different types of time axes have been obtained, it should be possible to plot the formant(s) as a function of time for this third segment. Thus for F2 as a function of relative time for this third segment: ```{r} plot(times_rel_3, f_3$T2) ``` But a much better option is to make use of `ggplot2` for the same purpose applied to the trackdata object. ```{r} # choose data from the 3rd segment i.fm %>% filter(sl_rowIdx == 3) %>% # send to ggplot ggplot() + # plot T2 (F2) on the y-axis, times-rel # on the x-axis aes(y = T2, x = times_rel) + # plot points - use geom_line() for a line geom_point() + # add some axis-titles xlab("Time (ms)") + ylab("F2 (Hz)") ``` To make an F2 plot as a function of normalised time for *all* the segments in the segment list `i.s` requires **grouping by segment identifier** using the `group` argument to `aes()`: ```{r} i.fm %>% ggplot() + aes(y = T2, x = times_norm, group = sl_rowIdx) + geom_line() + xlab("Proportional time") + ylab("F2 (Hz)") ``` Colour coding can be used to distinguish between different annotation types. The following for example plots F2 for all [ei, ai] diphthongs in the database. ```{r} # make a segment list dip.s = query(ae, "Phonetic = ei | ai") # get the formants dip.fm = get_trackdata(ae, dip.s, "fm") # plot F2 vs. normalised time dip.fm %>% ggplot() + aes(y = T2, x = times_norm, col = labels, group = sl_rowIdx) + geom_line() + xlab("Proportional time") + ylab("F2 (Hz)") ``` Notwithstanding the obvious formant tracking error in one of the [ei] segments, a common type of plot is one in which an aggregate is made per annotation type. This will, however, require there to be **an equal number of normalised time points per segment**. The function `normalise_length()` can be applied for this purpose. The following makes a new trackdata object such that each segment has 11 equally spaced normalised time values between 0 and 1 i.e. at time values: ```{r} dip.fm.n = normalize_length(dip.fm, N = 11) ``` The following verifies that there are an equal number of such time points for each of the 9 segments: ```{r} dip.fm.n %>% select(sl_rowIdx, times_norm) %>% table() ``` So now an F2 aggregate as a function of normalised time can be calculated for each of the two annotation types and plotted: ```{r} dip.fm.n %>% # for each label and for each # normalised time point group_by(labels, times_norm) %>% # calculate the F2 mean summarise(T2 = mean(T2)) %>% ungroup() %>% # plot ggplot() + # F2 vs normalised time aes(y = T2, x = times_norm, # colour code by annotation col = labels, # and grouped by annotation group = labels) + geom_line() + xlab("Proportional time") + ylab("F2 (Hz)") ```