Follow the setup instructions given here, i.e. download R and RStudio, create a directory on your computer where you will store files on this course, make a note of the directory path, create an R project that accesses this directory, and install all indicated packages.
For this and subsequent tutorials, access the tidyverse
,magrittr
, emuR
, and wrassp
libraries:
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'magrittr'
##
## The following object is masked from 'package:purrr':
##
## set_names
##
## The following object is masked from 'package:tidyr':
##
## extract
##
## Attaching package: 'emuR'
##
## The following object is masked from 'package:base':
##
## norm
Functions:
create_emuRdemoData()
for viewing an online demo databasefile.path()
: An R function for defining the location of a directory on your systemThe following command downloads and stores a demonstration Emu databases in a temporary directory defined by the tempdir()
function:
Emu databases that have been created end in _emuDB
. The Emu database is physically stored inside the directory emuR_demoData
(which is inside the directory created by tempdir()
). The path to this emuDB is given by:
## [1] "/var/folders/x_/x690j1dj703f09w41vm3hxd80000gp/T//Rtmp7hCpVF/emuR_demoData/ae_emuDB"
In order to see the files that are physically stored in the ae_emuDB
, first save the pathname and then use list.files()
:
path.ae = file.path(tempdir(), "emuR_demoData", "ae_emuDB")
# what are the files?
list.files(path.ae)
## [1] "0000_ses" "ae_DBconfig.json"
The file ae_DBconfig.json
is a template that stores information about the defining properties of the ae
database (see next section). The utterances (in this case) are all stored in the directory 0000_ses
. To look inside this directory:
## [1] "msajc003_bndl" "msajc010_bndl" "msajc012_bndl" "msajc015_bndl"
## [5] "msajc022_bndl" "msajc023_bndl" "msajc057_bndl"
which shows that there are 7 so called bundles. These are directories and Emu organises things such that all the files that belong to the same utterance are always in the same bundle. The utterance name precedes _
. Thus in this case, it is clear from the output above that there are 7 utterances whose names are msajc003
, msajc010
, msajc012
, msajc015
, msajc023
, msajc057
. To see what files there are for any utterance e.g. for msajc003
:
## [1] "msajc003_annot.json" "msajc003.dft" "msajc003.fms"
## [4] "msajc003.wav"
shows that msajc003
has the following files:
_annot.json
: this stores information about the annotations..dft
: this is a derived signal files of DFT data obtained from the speech waveform..fms
: this is formant data also obtained from the speech waveform..wav
: this is the waveform itself.In order to access the ae
emuDB in R, the above path needs to be stored and passed to the function load_emuDB()
that reads the database into R:
# store the above path name
path2ae = file.path(tempdir(), "emuR_demoData", "ae_emuDB")
# read it into R. Here the emuDB has been stored as `ae` (Australian English)
ae = load_emuDB(path2ae)
## INFO: Loading EMU database from /var/folders/x_/x690j1dj703f09w41vm3hxd80000gp/T//Rtmp7hCpVF/emuR_demoData/ae_emuDB... (7 bundles found)
##
|
| | 0%
|
|========== | 14%
|
|==================== | 29%
|
|============================== | 43%
|
|======================================== | 57%
|
|================================================== | 71%
|
|============================================================ | 86%
|
|======================================================================| 100%
Functions:
summary()
lists the salient contents of an emuDBlist_bundles()
: lists the so-called bundles of an emuDBlist_levelDefinitions()
: lists the annotation levels of tiers of an emuDBlist_attributeDefinitions()
: lists the attribute tiers of an annotation tierlist_linkDefinitions()
: lists the links between the tiers of an emuDBlist_ssffTrackDefinitions()
: lists the available signal files of an emuDBThe following summarises the salient attributes of the emuDB that has just been stored:
##
## ── Summary of emuDB ────────────────────────────────────────────────────────────
## Name: ae
## UUID: 0fc618dc-8980-414d-8c7a-144a649ce199
## Directory: /private/var/folders/x_/x690j1dj703f09w41vm3hxd80000gp/T/Rtmp7hCpVF/emuR_demoData/ae_emuDB
## Session count: 1
## Bundle count: 7
## Annotation item count: 736
## Label count: 844
## Link count: 785
##
## ── Database configuration ──────────────────────────────────────────────────────
##
## ── SSFF track definitions ──
##
## name columnName fileExtension
## dft dft dft
## fm fm fms
## ── Level definitions ──
## name type nrOfAttrDefs attrDefNames
## Utterance ITEM 1 Utterance;
## Intonational ITEM 1 Intonational;
## Intermediate ITEM 1 Intermediate;
## Word ITEM 3 Word; Accent; Text;
## Syllable ITEM 1 Syllable;
## Phoneme ITEM 1 Phoneme;
## Phonetic SEGMENT 1 Phonetic;
## Tone EVENT 1 Tone;
## Foot ITEM 1 Foot;
## ── Link definitions ──
## type superlevelName sublevelName
## ONE_TO_MANY Utterance Intonational
## ONE_TO_MANY Intonational Intermediate
## ONE_TO_MANY Intermediate Word
## ONE_TO_MANY Word Syllable
## ONE_TO_MANY Syllable Phoneme
## MANY_TO_MANY Phoneme Phonetic
## ONE_TO_MANY Syllable Tone
## ONE_TO_MANY Intonational Foot
## ONE_TO_MANY Foot Syllable
The directory
shows the path where the directory is located.
The bundle count
is important: it shows how many utterances there are in the database as well as their utterance names. This information is also given by:
## # A tibble: 7 × 2
## session name
## <chr> <chr>
## 1 0000 msajc003
## 2 0000 msajc010
## 3 0000 msajc012
## 4 0000 msajc015
## 5 0000 msajc022
## 6 0000 msajc023
## 7 0000 msajc057
which confirms what was seen earlier: there are 7 utterances.
The ── SSFF track definitions ──
show which signals are currently available in the database apart from waveform files. For this database there are signals of type dft
(discrete Fourier transform) and of type fm
(formant). They have extensions .dft
and .fms
. This information is also given by:
## name columnName fileExtension
## 1 dft dft dft
## 2 fm fm fms
The ── Level definitions ──
show the available tiers or annotation levels of the database. This information is also given by:
## name type nrOfAttrDefs attrDefNames
## 1 Utterance ITEM 1 Utterance;
## 2 Intonational ITEM 1 Intonational;
## 3 Intermediate ITEM 1 Intermediate;
## 4 Word ITEM 3 Word; Accent; Text;
## 5 Syllable ITEM 1 Syllable;
## 6 Phoneme ITEM 1 Phoneme;
## 7 Phonetic SEGMENT 1 Phonetic;
## 8 Tone EVENT 1 Tone;
## 9 Foot ITEM 1 Foot;
As the above shows, annotation tiers can be of three types: ITEM
, SEGMENT
, and EVENT
. The annotations of ITEM
tiers inherit their times from the (typically) SEGMENT
tiers that they dominate. In a SEGMENT
tier, each annotation has a start and end time. In an EVENT
tier, each annotation is defined by a single point in time. A tier can also be associated with one or more ATTRIBUTE
tiers. This information is also given by the function list_attributeDefinitions()
with the database name as the first argument, and the tier to be queried for attributes as the second. For example:
## name level type hasLabelGroups hasLegalLabels
## 1 Word Word STRING FALSE FALSE
## 2 Accent Word STRING FALSE FALSE
## 3 Text Word STRING FALSE FALSE
shows that the tiers Accent
and Text
are attributes of Word
. The annotations of an attribute tier always have identical times to those of the main tier with which they are associated (thus the annotations of the Accent
tier have identical start and end times to those of the Word
) tier. An attribute tier is often used to provide additional information about annotations. In the ae
database, the Word
tier consists of annotations consisting entirely of C
(content word) and F
(function word). The annotations at the Text
tier are used to provide the orthography for each content or function word annotation; and the annotations of the Accent
tier are used to mark whether or not a word is prosodically accented or not.
The information in ── Link definitions ──
of summary(ae)
shows how the tiers are associated with each other. This information is also provided by the function list_linkDefinitions()
:
## type superlevelName sublevelName
## 1 ONE_TO_MANY Utterance Intonational
## 2 ONE_TO_MANY Intonational Intermediate
## 3 ONE_TO_MANY Intermediate Word
## 4 ONE_TO_MANY Word Syllable
## 5 ONE_TO_MANY Syllable Phoneme
## 6 MANY_TO_MANY Phoneme Phonetic
## 7 ONE_TO_MANY Syllable Tone
## 8 ONE_TO_MANY Intonational Foot
## 9 ONE_TO_MANY Foot Syllable
In the Emu system, annotation tiers can be (but need not be) hierarchically organised with respect to each other. The point of doing so is broadly two-fold. The first is to allow annotations to inherit times if these are predictable. In Fig. 3.1 for example, the start and end times of the annotation seen
are completely predictable from the annotations at the Phonetic
tier that it dominates. Compatibly, the Emu system provides a way for the start and end times of seen
to be inherited from the annotations at the hierarchically lower SEGMENT
tier Phonetic
that it dominates.
Figure 3.1: Left: Text
is an item tier, Phonetic
is a segment tier as shown by (S)
. Text
dominates Phonetic
as shown by the vertical downward arrow. The duration of [s] is \(t_2 - t_1\), of [i], \(t_3 - t_2\), and of [n], \(t_4 - t_3\). Because Text
is an ITEM
tier that dominates Phonetic
which is a SEGMENT
tier, annotations at the Text
tier inherit their times from Phonetic
. Consequently, the duration of seen
is \(t_4 - t_1\). Text
and Phonetic
stand in a ONE-TO-MANY
association (as signified by the downward arrow) because an annotation at the Text
tier can be associated with one or more annotations at the Phonetic
tier, but not vice versa. Right: Phoneme
and Phonetic
stand in a MANY-TO-MANY
relationship (as shown by the double arrow) because an annotation at the Phoneme
tier can map to more than one annotations at the Phonetic
tier and vice versa. In this hypothetical example of an annotation of the second syllable of a word like region, the single affricate annotation /dZ/ at the Phoneme
tier maps to a sequence of [d] and [Z] annotations at the Phonetic
tier, while the single annotation of the syllabic [n] at the Phonetic
tier maps to a sequence of annotations /@n/ at the Phoneme
tier. Note that /@/ and /n/ inherit the same start and end times and therefore have the same duration of \(t_4 - t_3\) i.e. they overlap with each other in time completely.
The second is to be able to query the database in order to obtain annotations at one tier with respect to another (e.g., all orthographic annotations of the vowels in the database; all H*
pitch accents in an intermediate phrase, etc.). Without this linkage, then these types of queries would not be possible.
Emu allows quite a flexible configuration of annotation tiers. The type of configuration can be defined by the user and will depend on the types of information that the user wants to be able to extract from the database. The configuration of annotation tiers for the currently loaded ae
database is shown in Fig. 3.2.
Figure 3.2: The links between the annotation tiers of the ae
database. ITEM
tiers are unmarked, SEGMENT
tiers are marked with (S)
and EVENT
tiers with (E)
. ATTRIBUTE
tiers have no arrow between them (thus Text
and Accent
are attribute tiers of Word
). A downward arrow signifies domination in a one-to-many relationship; a double arrow signifies domination in a many-to-many relationship.
Inherited times percolate up through the tree from time tiers i.e. from SEGMENT
and EVENT
tiers upwards through ITEM
tiers. Thus, Phoneme
is an item tier which inherits its times from the SEGMENT
tier Phonetic
. Word
inherits its times from Syllable
which inherits its times from Phoneme
(and therefore from Phonetic
) and so on all the way up to the top tier Utterance
. Sometimes, tiers can inherit more than one set of times. In 3.2, Syllable
inherits times both from Phonetic (S)
and from Tone (E)
. For the same reason, all the tiers that dominate Syllable
(including Foot
) inherit these two sets of times.
Any two annotation tiers on the same path can be queried with respect to each other, where a path is defined as tiers connected by arrows. There are in fact four paths in the configuration:
Utterance -> Intonational -> Intermediate -> Word -> Syllable -> Phoneme <-> Phonetic (S)
Utterance -> Intonational -> Foot -> Syllable -> Phoneme <-> Phonetic (S)
Utterance -> Intonational -> Syllable -> Tone (E)
Utterance -> Intonational -> Intermediate -> Word -> Syllable -> Tone (E)
From (1-4), it becomes clear that e.g. annotations of the Syllable
tier can be queried with respect to Tone
(which syllables contain an H*
tone?) and vice versa (are any H*
tones in weak syllables?); or annotations at the Intermediate
tier can be queried with respect to Word
(how many words are there in an L-
intermediate phrase?) and vice-versa (which words are in an L-
intermediate phrase?). But e.g. Phoneme
and Tone
can’t be queried with respect to each other, and nor can Word
and Foot
because they aren’t on the same path.
Functions:
serve()
: to view and annotate an Emu databaseget_signalCanvasesOrder()
: show what signals are being displayed when serve()
is launched.set_signalCanvasesOrder()
: change the signals to be displayed.An Emu database can be viewed and annotated in at least two ways as follows:
# within the R graphics window
serve(ae)
# in a browser: preferably set your default browser to Chrome
serve(ae, useViewer=F)
Figure 4.1: The ae
database.
It is not the purpose of this introduction to give explicit instruction on how to annotate which is covered amply in the manual for the Emu Speech database management system especially section 9.
However, some basic properties can be noted. These include:
This information about the signals being displayed is also given by:
## [1] "OSCI" "SPEC"
OSCI
is the waveform and SPEC
the spectrogram. These can be changed as follows:
# display only the spectrogram.
set_signalCanvasesOrder(ae, "default", order = "SPEC")
# relaunch `serve()`. Displays only the spectrogram
serve(ae, useViewer = F)
# change it back to how it was:
set_signalCanvasesOrder(ae, "default", order = c("OSCI", "SPEC"))
# relaunch `serve()`. Displays waveform and spectrogram once again.
serve(ae, useViewer = F)
Emu will only ever display time tiers with signals (in this case there are two: Phonetic
and Tone
that are SEGMENT
and EVENT
tiers respectively). The times tiers to be displayed and their order can be shown and changed as follows:
# which annotation tiers are displays underneath signals?
get_levelCanvasesOrder(ae, "default")
#[1] "Phonetic" "Tone"
# change the display so that Tone is underneath the spectrogram
set_levelCanvasesOrder(ae, "default", c("Tone", "Phonetic"))
# relaunch
serve(ae, useViewer = F)
# display only the `Tone tier`
set_levelCanvasesOrder(ae, "default", "Tone")
serve(ae, useViewer = F)
# change it back to how it was
set_levelCanvasesOrder(ae, "default", c("Phonetic", "Tone"))
serve(ae, useViewer = F)
The ITEM
annotation tiers can all be seen in the hierarchy view. The four paths identified earlier are visible in clicking on the triangle on the far right (Fig. 4.1). Clicking the triangle can also be used to change to another path. The attribute tiers can be seen by clicking on one of the tier names displayed at the top – e.g. click on Word
to show the attribute tiers Text
and Accent
with which the Word
tier is associated.