Follow the setup instructions given here, i.e. download R and RStudio, create a directory on your computer where you will store files on this course, make a note of the directory path, create an R project that accesses this directory, and install all indicated packages.
For this and subsequent tutorials, access the tidyverse
,magrittr
, emuR
, and wrassp
libraries:
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'magrittr'
##
## The following object is masked from 'package:purrr':
##
## set_names
##
## The following object is masked from 'package:tidyr':
##
## extract
##
## Attaching package: 'emuR'
##
## The following object is masked from 'package:base':
##
## norm
The following makes use of the demonstration database emuDB
that was also used here.
Store and access the demo database as also described here and thus:
create_emuRdemoData(dir = tempdir())
path.ae = file.path(tempdir(), "emuR_demoData", "ae_emuDB")
ae = load_emuDB(path.ae)
## INFO: Loading EMU database from /var/folders/x_/x690j1dj703f09w41vm3hxd80000gp/T//Rtmp79FYCS/emuR_demoData/ae_emuDB... (7 bundles found)
##
|
| | 0%
|
|========== | 14%
|
|==================== | 29%
|
|============================== | 43%
|
|======================================== | 57%
|
|================================================== | 71%
|
|============================================================ | 86%
|
|======================================================================| 100%
##
## ── Summary of emuDB ────────────────────────────────────────────────────────────
## Name: ae
## UUID: 0fc618dc-8980-414d-8c7a-144a649ce199
## Directory: /private/var/folders/x_/x690j1dj703f09w41vm3hxd80000gp/T/Rtmp79FYCS/emuR_demoData/ae_emuDB
## Session count: 1
## Bundle count: 7
## Annotation item count: 736
## Label count: 844
## Link count: 785
##
## ── Database configuration ──────────────────────────────────────────────────────
##
## ── SSFF track definitions ──
##
## name columnName fileExtension
## dft dft dft
## fm fm fms
## ── Level definitions ──
## name type nrOfAttrDefs attrDefNames
## Utterance ITEM 1 Utterance;
## Intonational ITEM 1 Intonational;
## Intermediate ITEM 1 Intermediate;
## Word ITEM 3 Word; Accent; Text;
## Syllable ITEM 1 Syllable;
## Phoneme ITEM 1 Phoneme;
## Phonetic SEGMENT 1 Phonetic;
## Tone EVENT 1 Tone;
## Foot ITEM 1 Foot;
## ── Link definitions ──
## type superlevelName sublevelName
## ONE_TO_MANY Utterance Intonational
## ONE_TO_MANY Intonational Intermediate
## ONE_TO_MANY Intermediate Word
## ONE_TO_MANY Word Syllable
## ONE_TO_MANY Syllable Phoneme
## MANY_TO_MANY Phoneme Phonetic
## ONE_TO_MANY Syllable Tone
## ONE_TO_MANY Intonational Foot
## ONE_TO_MANY Foot Syllable
The procedure in all cases is first to make a segment or event list using the query()
function that was
discussed here and then to make use of the function get_trackdata()
to obtain the signal data for that segment or event list. There are three cases to consider, depending on whether or not the signals that are to be read into R using get_trackdata()
already exist or not.
As discussed in this earlier module, the signal files that exist and that are accessible in emuR
are shown as follows:
## name columnName fileExtension
## 1 dft dft dft
## 2 fm fm fms
The meaning of the three columns was also explained in an earlier module.
The first of these, dft
contains spectral data, and the second, fm
, contains data of the first four formant frequencies. The following commands make a trackdata object of the first formant frequencies between the start time and end time of all [i:]
segments in the database:
# segment list of all [i:] segments
i.s = query(ae, "Phonetic = i:")
# trackdata object of the first four formant frequencies
i.fm = get_trackdata(ae, i.s, "fm")
# or
i.fm = i.s %>% get_trackdata(ae, ., "fm")
The audio waveform can also be read into R in a similar way using the argument MEDIAFILE_SAMPLES
. The following imports the waveforms of the [i:]
segments into R:
Making a trackdata object for an event list works in exactly the same way:
# get all H* tones
hstar.e = query(ae, "Tone = H*")
# Formant data
hstar.fm = get_trackdata(ae, hstar.e, "fm")
# or
hstar.fm = hstar.e %>%
get_trackdata(ae, ., "fm")
The number of observations in hstar.fm
should be exactly the same as the number of H*
events:
## [1] TRUE
This is necessarily so because as explained here, an event list contains annotations defined by a single point in time. For this reason, each event can only be associated with one signal value (or one set of signal values if multiparametric as in the case here of formants F1-F4). By contrast there are many more observations in a trackdata object derived from a segment list:
## [1] 6
## [1] 98
This is because the trackdata object contains data at regular intervals between each segment’s start and end time. And so since segments have a certain duration, there will in almost all cases be more trackdata observations than there are segments.
This has already been demonstrated when calculating pitch data in earlier modules. New signals can be added with the wrassp
package. The wrassp
package is a wrapper for R around Michel Scheffers’ libassp (Advanced Speech Signal Processor). The currently available signal processing functions provided by wrassp
are:
Command | Meaning |
---|---|
acfana() |
Analysis of short-term autocorrelation function |
afdiff() |
Computes the first difference of the signal |
affilter() |
Filters the audio signal (e.g., low-pass and high-pass) |
cepstrum() |
Short-term cepstral analysis |
cssSpectrum() |
Cepstral smoothed version of dftSpectrum() |
dftSpectrum() |
Short-term DFT spectral analysis |
forest() |
Formant estimation |
ksvF0() |
f0 analysis of the signal |
lpsSpectrum() |
Linear predictive smoothed version of dftSpectrum() |
mhsF0() |
Pitch analysis of the speech signal using Michel Scheffers’ Modified Harmonic Sieve algorithm |
rfcana() |
Linear prediction analysis |
rmsana() |
Analysis of short-term Root Mean Square amplitude |
zcrana() |
Analysis of the averages of the short-term positive and negative zero-crossing rates |
The fastest way to add new signals to the database is with the function add_files()
. Two new signals are added in the example below. One is RMS-energy for estimating a signal’s intensity. The other is the zero-crossing rate (ZCR) which is the number of times the waveform crosses the time axis expressed in Hz. ZCR typically follows the frequency where most energy is concentrated (and indeed the first spectral moment): it is therefore typically higher for fricatives than for sonorants and higher for [s] than for [ʃ]. These signals are added with the default parameters in the following example:
add_ssffTrackDefinition(ae,
"energy",
onTheFlyFunctionName = "rmsana")
add_ssffTrackDefinition(ae,
"zero_cross",
onTheFlyFunctionName = "zcrana")
The following shows that these signals have been added to the ae
database:
## name columnName fileExtension
## 1 dft dft dft
## 2 fm fm fms
## 3 energy rms rms
## 4 zero_cross zcr zcr
The corresponding signal data for these newly added signals can now be obtained in exactly the same way as before:
# zero-crossing frequency for the segment list
i.zcr = get_trackdata(ae, i.s, "zero_cross")
# or
i.zcr = i.s %>%
get_trackdata(ae, ., "zero_cross")
The physical location of these signal files is within the corresponding bundles. Recall that all the files associated with any utterance are stored in the same bundle. The bundles for ae
are here:
## # A tibble: 7 × 2
## session name
## <chr> <chr>
## 1 0000 msajc003
## 2 0000 msajc010
## 3 0000 msajc012
## 4 0000 msajc015
## 5 0000 msajc022
## 6 0000 msajc023
## 7 0000 msajc057
The signal files for the first of these utterances are shown here:
## [1] "msajc003_annot.json" "msajc003.dft" "msajc003.fms"
## [4] "msajc003.rms" "msajc003.wav" "msajc003.zcr"
and they are located at:
## [1] "/var/folders/x_/x690j1dj703f09w41vm3hxd80000gp/T//Rtmp79FYCS/emuR_demoData/ae_emuDB/0000_ses/msajc003_bndl"
Any of the wrassp
functions can be run with different parameter settings. The parameter settings can be seen using the formals()
function with the wrassp
signal processing routine as a single argument. Thus, to see the parameters associated with mhsF0
, one of the functions for calculating the fundamental frequency:
## $listOfFiles
## NULL
##
## $optLogFilePath
## NULL
##
## $beginTime
## [1] 0
##
## $centerTime
## [1] FALSE
##
## $endTime
## [1] 0
##
## $windowShift
## [1] 5
##
## $gender
## [1] "u"
##
## $maxF
## [1] 600
##
## $minF
## [1] 50
##
## $minAmp
## [1] 50
##
## $minAC1
## [1] 0.25
##
## $minRMS
## [1] 18
##
## $maxZCR
## [1] 3000
##
## $minProb
## [1] 0.52
##
## $plainSpectrum
## [1] FALSE
##
## $toFile
## [1] TRUE
##
## $explicitExt
## NULL
##
## $outputDirectory
## NULL
##
## $forceToLog
## useWrasspLogger
##
## $verbose
## [1] TRUE
One of the parameters $gender
can be specified as the default u
or m
(for male speakers) or f
(for female speakers). In order to add pitch data to the ae
database with the parameter set to m
ale:
The commands from the preceding sections have been used to store signals permanently in the database. However, it is possible to obtain signal data without storing it as part of the database for a segment or event list using get_trackdata()
with the argument onTheFlyFunctionName
. Thus even though no pitch data has been calculated and stored using the function ksvF0()
(as list_ssffTrackDefinitions(ae)
will show), it can still be obtained e.g. for the earlier segment or event lists as follows:
i.ksv = get_trackdata(ae,
i.s,
onTheFlyFunctionName = "ksvF0")
hstar.ksv = get_trackdata(ae,
hstar.e,
onTheFlyFunctionName = "ksvF0")
# or
hstar.ksv = hstar.e %>%
get_trackdata(ae, ., onTheFlyFunctionName = "ksvF0")
The parameters can once again be specified with the additional argument onTheFlyParams
. Thus to repeat the above but with gender set to m
:
A trackdata object is of the type tibble with the descriptors shown below. Most of these columns (commented with same
) have the same information as in the segment or event list from which the trackdata object was derived. Those that are different are highlighted in bold:
sl_rowIdx: a numerical vector for identifying the signals belonging to the n
th row of the segment (or event) list.
labels: annotations or sequenced annotations of segments concatenated by ->
(same)
start: onset time in milliseconds (same)
end: offset time in milliseconds (same)
db_uuid: UUID of emuDB (= a unique identifier) (same)
session: session name (same)
bundle: bundle name (= utterance name) (same)
start_item_id: item ID of first element of sequence (same)
end_item_id: item ID of last element of sequence (same)
level: name of the tier that has been searched (same)
attribute: name of attribute that has been searched (same)
start_item_seq_idx: sequence index of start item (same)
end_item_seq_idx: sequence index of end item (same)
type: type of “segment” row: ITEM
: symbolic item, EVENT
: event item, SEGMENT
: segment (same)
sample_start: start sample position (same)
sample_end: end sample position (same)
sample_rate: sample rate (same)
times_orig: the times at which the successive frames (per segment) of trackdata occur
times_rel: as times_orig but with the start time of the first frame (per segment) set to zero
times_norm: normalised time such that start time (per segment) is zero and the end time is 1
T1: the signal values. If there are multiple signals, then T2, T3… Tn (thus T1:T4 when extracting the first four formant frequencies)
The column names that are different from those of the segment/event list are:
sl_rowIdx
which allows identification of the signals that belong to segment number n
in the segment list. Thus, for the above example, the part of the trackdata object corresponding to i.s[3,]
i.e. the third segment of the segment list i.s
is:
## # A tibble: 24 × 24
## sl_rowIdx labels start end db_uuid session bundle start_item_id end_item_id
## <int> <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int>
## 1 3 i: 2569. 2692. 0fc618… 0000 msajc… 186 186
## 2 3 i: 2569. 2692. 0fc618… 0000 msajc… 186 186
## 3 3 i: 2569. 2692. 0fc618… 0000 msajc… 186 186
## 4 3 i: 2569. 2692. 0fc618… 0000 msajc… 186 186
## 5 3 i: 2569. 2692. 0fc618… 0000 msajc… 186 186
## 6 3 i: 2569. 2692. 0fc618… 0000 msajc… 186 186
## 7 3 i: 2569. 2692. 0fc618… 0000 msajc… 186 186
## 8 3 i: 2569. 2692. 0fc618… 0000 msajc… 186 186
## 9 3 i: 2569. 2692. 0fc618… 0000 msajc… 186 186
## 10 3 i: 2569. 2692. 0fc618… 0000 msajc… 186 186
## # ℹ 14 more rows
## # ℹ 15 more variables: level <chr>, attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>, times_orig <dbl>, times_rel <dbl>, times_norm <dbl>,
## # T1 <int>, T2 <int>, T3 <int>, T4 <int>
The number of segments in (i) the segment list and in (ii) the trackdata object derived from (i) is always the same. This can be verified by:
## [1] TRUE
or
# is the number of rows in the
# segment list equal to
i.s %>% nrow() ==
# the unique segment identifiers
# of the corersponding trackdata object?
i.fm %>%
select(sl_rowIdx) %>%
n_distinct()
## [1] TRUE
The formant data (of all four formants) for the 3rd segment is therefore given by:
## # A tibble: 24 × 4
## T1 T2 T3 T4
## <int> <int> <int> <int>
## 1 339 1307 2312 3685
## 2 341 1367 2288 3656
## 3 347 1425 2299 3655
## 4 344 1454 2293 3676
## 5 339 1515 2295 3679
## 6 364 1555 2300 3665
## 7 349 1673 2320 3618
## 8 350 1754 2337 3602
## 9 334 1773 2346 3601
## 10 311 1825 2383 3508
## # ℹ 14 more rows
For this third segment, there are 24 frames of data:
## [1] 24
A frame of data is the signal (or signals) that occur at a particular point of time within the segment. The frames of data extend at equal intervals (known as the frame rate – see below) between the start time and end time of a segment. Thus, the 24 frames of data for this third segment of i.s
extend at equal intervals between:
## [1] 2569.225
## [1] 2569.225
and
## [1] 2692.325
## [1] 2692.325
The actual times at which these formants for the third segment occur is given by:
## [1] 2572.5 2577.5 2582.5 2587.5 2592.5 2597.5 2602.5 2607.5 2612.5 2617.5
## [11] 2622.5 2627.5 2632.5 2637.5 2642.5 2647.5 2652.5 2657.5 2662.5 2667.5
## [21] 2672.5 2677.5 2682.5 2687.5
Notice how the time of the first frame
## [1] 2572.5
is a fraction greater than the left boundary time of the 3rd segment given by (i.s$start[3]
above); and the time of the last frame:
## [1] 2687.5
is a fraction less than the right boundary time of the 3rd segment given by (i.s$end[3]
above). The frame rate is the interval between frames which for these data is 5 ms. This is shown by the difference between successive times of the frames of data:
## [1] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
times_rel
resets the original times such that the first frame has a start time of zero. Thus for the 3rd segment times_rel
is
## [1] 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
## [20] 95 100 105 110 115
which is the same as the original times subtracted from the time of the first frame of data:
## [1] 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
## [20] 95 100 105 110 115
and which also shows that the frame rate is 5 ms. The frame rate is incidentally the same for all observations of the trackdata object (i.e. for all segments from which trackdata is obtained).
times_norm
is a form of linear time normalisation: it resets the times in times_rel
so that the time of the first frame of data remains at zero but the time of the last frame of data is 1. The normalised times are then at equal intervals between 0 and 1. For this third segment, they are:
## [1] 0.00000000 0.04347826 0.08695652 0.13043478 0.17391304 0.21739130
## [7] 0.26086957 0.30434783 0.34782609 0.39130435 0.43478261 0.47826087
## [13] 0.52173913 0.56521739 0.60869565 0.65217391 0.69565217 0.73913043
## [19] 0.78260870 0.82608696 0.86956522 0.91304348 0.95652174 1.00000000
The normalised times are times_rel
divided by the total duration of all frames of data (i.e. by the difference in time between the first and last frame of data). The total duration of the frames of data is just the last value of times_rel
. So for this third segment, the duration of the 24 frames of data is 115 ms:
## [1] 115
Thus the normalised times for the third segment are also given by:
## [1] 0.00000000 0.04347826 0.08695652 0.13043478 0.17391304 0.21739130
## [7] 0.26086957 0.30434783 0.34782609 0.39130435 0.43478261 0.47826087
## [13] 0.52173913 0.56521739 0.60869565 0.65217391 0.69565217 0.73913043
## [19] 0.78260870 0.82608696 0.86956522 0.91304348 0.95652174 1.00000000
which is the same as times_norm_3
obtained above. Time-normalised values can be helpful when comparing the shape of trajectories independently of whether these shapes are of different duration (e.g. comparing the rise and fall of F2
for vowels of different duration).
The issue here is how to obtain signal data for each segment at a particular proportion of the segment’s duration – for example, at the segment’s temporal midpoint. One way is to use the cut
argument to the function get_trackdata()
. For example, the following obtains the formant values for the segment list i.s
at the temporal midpoint:
In this case, there is one observation per segment in the trackdata object i.fm5
(the formant data at the temporal midpoint). For this reason, the number of observations (rows) in i.fm5
is the same as the number of rows in the segment list i.s
from which the trackdata object was derived:
## [1] TRUE
Another way is to first create a trackdata object for all time points, then round the normalised times to values 0, 0.1, … 1, then aggregate at each of these normalised time values, and then finally extract the formant data at e.g. time point 0.5. The following makes uses of dplyr
commands in order to identify F1 and F2 at (aggregated) normalised time point 0.5 in i.fm
.
# make a data-frame i.fm5b
i.fm5b = i.fm %>%
# create a column times_norm2 that is times_norm
# rounded to one decimal place
mutate(times_norm2 = round(times_norm,1)) %>%
# for each unique element in times_norm2
# and in sl_rowIdx
group_by(times_norm2, sl_rowIdx) %>%
# calculate the F1-mean and F2-mean
summarise(T1 = mean(T1), T2 = mean(T2)) %>%
# extract these T1 and T2 values at
# aggregated normalised time point 0.5
filter(times_norm2 == .5) %>%
# it's good practice to ungroup after using group_by()
ungroup()
## `summarise()` has grouped output by 'times_norm2'. You can override using the
## `.groups` argument.
## # A tibble: 6 × 4
## times_norm2 sl_rowIdx T1 T2
## <dbl> <int> <dbl> <dbl>
## 1 0.5 1 287 1727
## 2 0.5 2 298 2273
## 3 0.5 3 325 1902.
## 4 0.5 4 245 2265
## 5 0.5 5 320 1910.
## 6 0.5 6 320 1835
As the above data-frame shows, there are 6 rows (one row per segment in i.s
) with F1 and F2 data (columns T1 and T2) at the temporal midpoint of each segment. (Another way of obtaining frames of data at a particular time point is to extract them at time point 0.5 after applying the function normalize_length()
as explained in the next section).
Extracting data between two (proportional) time points can be straightforwardly accomplished, once the trackdata object has been derived. E.g. to create a trackdata object of the middle third of the segment duration:
## [1] 0.3478261 0.6521739
## [1] 32
As the above shows, i.fm_middle
has around 1/3 of the observations of i.fm
(98 observations) and consists of observations with normalised times greater than 0.34 and less than 0.66.
Now that formant data and different types of time axes have been obtained, it should be possible to plot the formant(s) as a function of time for this third segment. Thus for F2 as a function of relative time for this third segment:
But a much better option is to make use of ggplot2
for the same purpose applied to the trackdata object.
# choose data from the 3rd segment
i.fm %>%
filter(sl_rowIdx == 3) %>%
# send to ggplot
ggplot() +
# plot T2 (F2) on the y-axis, times-rel
# on the x-axis
aes(y = T2, x = times_rel) +
# plot points - use geom_line() for a line
geom_point() +
# add some axis-titles
xlab("Time (ms)") +
ylab("F2 (Hz)")
To make an F2 plot as a function of normalised time for all the segments in the segment list i.s
requires grouping by segment identifier using the group
argument to aes()
:
i.fm %>%
ggplot() +
aes(y = T2, x = times_norm, group = sl_rowIdx) +
geom_line() +
xlab("Proportional time") +
ylab("F2 (Hz)")
Colour coding can be used to distinguish between different annotation types. The following for example plots F2 for all [ei, ai] diphthongs in the database.
dip.fm =
# Make a segment list
query(ae, "Phonetic = ei | ai") %>%
# get the formants
get_trackdata(ae, ., "fm")
# Plot F2 vs. normalised time
dip.fm %>%
# plot F2 vs. normalised time
ggplot() +
aes(y = T2,
x = times_norm,
col = labels,
group = sl_rowIdx) +
geom_line() +
xlab("Proportional time") +
ylab("F2 (Hz)")
Notwithstanding the obvious formant tracking error in one of the [ei] segments, a common type of plot is one in which an aggregate is made per annotation type. This will, however, require there to be an equal number of normalised time points per segment. The function normalise_length()
can be applied for this purpose. The following makes a new trackdata object such that each segment has 11 equally spaced normalised time values between 0 and 1 i.e. at time values:
At the core of the EmuR normalize_length()
function is the R function approx
. An equivalent result can be given with:
N = 11
dip.fm.n2 = dip.fm %>%
# for each segment
group_by(sl_rowIdx) %>%
reframe(
# normalize the F1 values
T1 = approx(times_rel, T1, n = N)$y,
# normalise the F2 values
T2 = approx(times_rel, T2, n = N)$y,
# give the times for each segment
times = seq(0, 1, length.out = N)) %>%
ungroup()
Verify that this is the same with e.g.
## [1] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [6] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [11] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [16] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [21] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [26] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [31] -2.273737e-13 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [36] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [41] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [46] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [51] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [56] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [61] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.136868e-13
## [66] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [71] 0.000000e+00 0.000000e+00 -2.273737e-12 0.000000e+00 0.000000e+00
## [76] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [81] 0.000000e+00 0.000000e+00 0.000000e+00 -2.273737e-13 0.000000e+00
## [86] 0.000000e+00 -2.273737e-13 0.000000e+00 0.000000e+00 0.000000e+00
## [91] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [96] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
Note that normalize_length()
preserves all the original columns of the original trackdata object, whereas the above code only returns (in this case) normalized F1, F2 and the corresponding normalized times and the segment identifier (`$sl_rowIdx).
The following verifies that there are an equal number of such time points for each of the 9 segments:
## times_norm
## sl_rowIdx 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
## 1 1 1 1 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1 1 1 1
## 4 1 1 1 1 1 1 1 1 1 1 1
## 5 1 1 1 1 1 1 1 1 1 1 1
## 6 1 1 1 1 1 1 1 1 1 1 1
## 7 1 1 1 1 1 1 1 1 1 1 1
## 8 1 1 1 1 1 1 1 1 1 1 1
## 9 1 1 1 1 1 1 1 1 1 1 1
So now an F2 aggregate as a function of normalised time can be calculated for each of the two annotation types and plotted:
dip.fm.n %>%
# for each label and for each
# normalised time point
group_by(labels, times_norm) %>%
# calculate the F2 mean
summarise(T2 = mean(T2)) %>%
ungroup() %>%
# plot
ggplot() +
# F2 vs normalised time
aes(y = T2,
x = times_norm,
# colour code by annotation
col = labels,
# and grouped by annotation
group = labels) +
geom_line() +
xlab("Proportional time") +
ylab("F2 (Hz)")
## `summarise()` has grouped output by 'labels'. You can override using the
## `.groups` argument.