Follow the setup instructions given here, i.e. download R and RStudio, create a directory on your computer where you will store files on this course, make a note of the directory path, create an R project that accesses this directory, and install all indicated packages.
For this and subsequent tutorials, access the tidyverse,magrittr, emuR, and wrassp libraries:
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'magrittr'
##
## The following object is masked from 'package:purrr':
##
## set_names
##
## The following object is masked from 'package:tidyr':
##
## extract
##
## Attaching package: 'emuR'
##
## The following object is masked from 'package:base':
##
## norm
Functions:
create_emuRdemoData() for viewing an online demo databasefile.path(): An R function for defining the location of a directory on your systemThe following command downloads and stores a demonstration Emu databases in a temporary directory defined by the tempdir() function:
Emu databases that have been created end in _emuDB. The Emu database is physically stored inside the directory emuR_demoData (which is inside the directory created by tempdir()). The path to this emuDB is given by:
## [1] "/var/folders/x_/x690j1dj703f09w41vm3hxd80000gp/T//Rtmp7hCpVF/emuR_demoData/ae_emuDB"
In order to see the files that are physically stored in the ae_emuDB, first save the pathname and then use list.files():
path.ae = file.path(tempdir(), "emuR_demoData", "ae_emuDB")
# what are the files?
list.files(path.ae)## [1] "0000_ses" "ae_DBconfig.json"
The file ae_DBconfig.json is a template that stores information about the defining properties of the ae database (see next section). The utterances (in this case) are all stored in the directory 0000_ses. To look inside this directory:
## [1] "msajc003_bndl" "msajc010_bndl" "msajc012_bndl" "msajc015_bndl"
## [5] "msajc022_bndl" "msajc023_bndl" "msajc057_bndl"
which shows that there are 7 so called bundles. These are directories and Emu organises things such that all the files that belong to the same utterance are always in the same bundle. The utterance name precedes _. Thus in this case, it is clear from the output above that there are 7 utterances whose names are msajc003, msajc010, msajc012, msajc015, msajc023, msajc057. To see what files there are for any utterance e.g. for msajc003:
## [1] "msajc003_annot.json" "msajc003.dft" "msajc003.fms"
## [4] "msajc003.wav"
shows that msajc003 has the following files:
_annot.json: this stores information about the annotations..dft: this is a derived signal files of DFT data obtained from the speech waveform..fms: this is formant data also obtained from the speech waveform..wav: this is the waveform itself.In order to access the ae emuDB in R, the above path needs to be stored and passed to the function load_emuDB() that reads the database into R:
# store the above path name
path2ae = file.path(tempdir(), "emuR_demoData", "ae_emuDB")
# read it into R. Here the emuDB has been stored as `ae` (Australian English)
ae = load_emuDB(path2ae)## INFO: Loading EMU database from /var/folders/x_/x690j1dj703f09w41vm3hxd80000gp/T//Rtmp7hCpVF/emuR_demoData/ae_emuDB... (7 bundles found)
##
|
| | 0%
|
|========== | 14%
|
|==================== | 29%
|
|============================== | 43%
|
|======================================== | 57%
|
|================================================== | 71%
|
|============================================================ | 86%
|
|======================================================================| 100%
Functions:
summary() lists the salient contents of an emuDBlist_bundles(): lists the so-called bundles of an emuDBlist_levelDefinitions(): lists the annotation levels of tiers of an emuDBlist_attributeDefinitions(): lists the attribute tiers of an annotation tierlist_linkDefinitions(): lists the links between the tiers of an emuDBlist_ssffTrackDefinitions(): lists the available signal files of an emuDBThe following summarises the salient attributes of the emuDB that has just been stored:
##
## ── Summary of emuDB ────────────────────────────────────────────────────────────
## Name: ae
## UUID: 0fc618dc-8980-414d-8c7a-144a649ce199
## Directory: /private/var/folders/x_/x690j1dj703f09w41vm3hxd80000gp/T/Rtmp7hCpVF/emuR_demoData/ae_emuDB
## Session count: 1
## Bundle count: 7
## Annotation item count: 736
## Label count: 844
## Link count: 785
##
## ── Database configuration ──────────────────────────────────────────────────────
##
## ── SSFF track definitions ──
##
## name columnName fileExtension
## dft dft dft
## fm fm fms
## ── Level definitions ──
## name type nrOfAttrDefs attrDefNames
## Utterance ITEM 1 Utterance;
## Intonational ITEM 1 Intonational;
## Intermediate ITEM 1 Intermediate;
## Word ITEM 3 Word; Accent; Text;
## Syllable ITEM 1 Syllable;
## Phoneme ITEM 1 Phoneme;
## Phonetic SEGMENT 1 Phonetic;
## Tone EVENT 1 Tone;
## Foot ITEM 1 Foot;
## ── Link definitions ──
## type superlevelName sublevelName
## ONE_TO_MANY Utterance Intonational
## ONE_TO_MANY Intonational Intermediate
## ONE_TO_MANY Intermediate Word
## ONE_TO_MANY Word Syllable
## ONE_TO_MANY Syllable Phoneme
## MANY_TO_MANY Phoneme Phonetic
## ONE_TO_MANY Syllable Tone
## ONE_TO_MANY Intonational Foot
## ONE_TO_MANY Foot Syllable
The directory shows the path where the directory is located.
The bundle count is important: it shows how many utterances there are in the database as well as their utterance names. This information is also given by:
## # A tibble: 7 × 2
## session name
## <chr> <chr>
## 1 0000 msajc003
## 2 0000 msajc010
## 3 0000 msajc012
## 4 0000 msajc015
## 5 0000 msajc022
## 6 0000 msajc023
## 7 0000 msajc057
which confirms what was seen earlier: there are 7 utterances.
The ── SSFF track definitions ── show which signals are currently available in the database apart from waveform files. For this database there are signals of type dft (discrete Fourier transform) and of type fm (formant). They have extensions .dft and .fms. This information is also given by:
## name columnName fileExtension
## 1 dft dft dft
## 2 fm fm fms
The ── Level definitions ── show the available tiers or annotation levels of the database. This information is also given by:
## name type nrOfAttrDefs attrDefNames
## 1 Utterance ITEM 1 Utterance;
## 2 Intonational ITEM 1 Intonational;
## 3 Intermediate ITEM 1 Intermediate;
## 4 Word ITEM 3 Word; Accent; Text;
## 5 Syllable ITEM 1 Syllable;
## 6 Phoneme ITEM 1 Phoneme;
## 7 Phonetic SEGMENT 1 Phonetic;
## 8 Tone EVENT 1 Tone;
## 9 Foot ITEM 1 Foot;
As the above shows, annotation tiers can be of three types: ITEM, SEGMENT, and EVENT. The annotations of ITEM tiers inherit their times from the (typically) SEGMENT tiers that they dominate. In a SEGMENT tier, each annotation has a start and end time. In an EVENT tier, each annotation is defined by a single point in time. A tier can also be associated with one or more ATTRIBUTE tiers. This information is also given by the function list_attributeDefinitions() with the database name as the first argument, and the tier to be queried for attributes as the second. For example:
## name level type hasLabelGroups hasLegalLabels
## 1 Word Word STRING FALSE FALSE
## 2 Accent Word STRING FALSE FALSE
## 3 Text Word STRING FALSE FALSE
shows that the tiers Accent and Text are attributes of Word. The annotations of an attribute tier always have identical times to those of the main tier with which they are associated (thus the annotations of the Accent tier have identical start and end times to those of the Word) tier. An attribute tier is often used to provide additional information about annotations. In the ae database, the Word tier consists of annotations consisting entirely of C (content word) and F (function word). The annotations at the Text tier are used to provide the orthography for each content or function word annotation; and the annotations of the Accent tier are used to mark whether or not a word is prosodically accented or not.
The information in ── Link definitions ── of summary(ae) shows how the tiers are associated with each other. This information is also provided by the function list_linkDefinitions():
## type superlevelName sublevelName
## 1 ONE_TO_MANY Utterance Intonational
## 2 ONE_TO_MANY Intonational Intermediate
## 3 ONE_TO_MANY Intermediate Word
## 4 ONE_TO_MANY Word Syllable
## 5 ONE_TO_MANY Syllable Phoneme
## 6 MANY_TO_MANY Phoneme Phonetic
## 7 ONE_TO_MANY Syllable Tone
## 8 ONE_TO_MANY Intonational Foot
## 9 ONE_TO_MANY Foot Syllable
In the Emu system, annotation tiers can be (but need not be) hierarchically organised with respect to each other. The point of doing so is broadly two-fold. The first is to allow annotations to inherit times if these are predictable. In Fig. 3.1 for example, the start and end times of the annotation seen are completely predictable from the annotations at the Phonetic tier that it dominates. Compatibly, the Emu system provides a way for the start and end times of seen to be inherited from the annotations at the hierarchically lower SEGMENT tier Phonetic that it dominates.
Figure 3.1: Left: Text is an item tier, Phonetic is a segment tier as shown by (S). Text dominates Phonetic as shown by the vertical downward arrow. The duration of [s] is \(t_2 - t_1\), of [i], \(t_3 - t_2\), and of [n], \(t_4 - t_3\). Because Text is an ITEM tier that dominates Phonetic which is a SEGMENT tier, annotations at the Text tier inherit their times from Phonetic. Consequently, the duration of seen is \(t_4 - t_1\). Text and Phonetic stand in a ONE-TO-MANY association (as signified by the downward arrow) because an annotation at the Text tier can be associated with one or more annotations at the Phonetic tier, but not vice versa. Right: Phoneme and Phonetic stand in a MANY-TO-MANY relationship (as shown by the double arrow) because an annotation at the Phoneme tier can map to more than one annotations at the Phonetic tier and vice versa. In this hypothetical example of an annotation of the second syllable of a word like region, the single affricate annotation /dZ/ at the Phoneme tier maps to a sequence of [d] and [Z] annotations at the Phonetic tier, while the single annotation of the syllabic [n] at the Phonetic tier maps to a sequence of annotations /@n/ at the Phoneme tier. Note that /@/ and /n/ inherit the same start and end times and therefore have the same duration of \(t_4 - t_3\) i.e. they overlap with each other in time completely.
The second is to be able to query the database in order to obtain annotations at one tier with respect to another (e.g., all orthographic annotations of the vowels in the database; all H* pitch accents in an intermediate phrase, etc.). Without this linkage, then these types of queries would not be possible.
Emu allows quite a flexible configuration of annotation tiers. The type of configuration can be defined by the user and will depend on the types of information that the user wants to be able to extract from the database. The configuration of annotation tiers for the currently loaded ae database is shown in Fig. 3.2.
Figure 3.2: The links between the annotation tiers of the ae database. ITEM tiers are unmarked, SEGMENT tiers are marked with (S) and EVENT tiers with (E). ATTRIBUTE tiers have no arrow between them (thus Text and Accent are attribute tiers of Word). A downward arrow signifies domination in a one-to-many relationship; a double arrow signifies domination in a many-to-many relationship.
Inherited times percolate up through the tree from time tiers i.e. from SEGMENT and EVENT tiers upwards through ITEM tiers. Thus, Phoneme is an item tier which inherits its times from the SEGMENT tier Phonetic. Word inherits its times from Syllable which inherits its times from Phoneme (and therefore from Phonetic) and so on all the way up to the top tier Utterance. Sometimes, tiers can inherit more than one set of times. In 3.2, Syllable inherits times both from Phonetic (S) and from Tone (E). For the same reason, all the tiers that dominate Syllable (including Foot) inherit these two sets of times.
Any two annotation tiers on the same path can be queried with respect to each other, where a path is defined as tiers connected by arrows. There are in fact four paths in the configuration:
Utterance -> Intonational -> Intermediate -> Word -> Syllable -> Phoneme <-> Phonetic (S)Utterance -> Intonational -> Foot -> Syllable -> Phoneme <-> Phonetic (S)Utterance -> Intonational -> Syllable -> Tone (E)Utterance -> Intonational -> Intermediate -> Word -> Syllable -> Tone (E)From (1-4), it becomes clear that e.g. annotations of the Syllable tier can be queried with respect to Tone (which syllables contain an H* tone?) and vice versa (are any H* tones in weak syllables?); or annotations at the Intermediate tier can be queried with respect to Word (how many words are there in an L- intermediate phrase?) and vice-versa (which words are in an L- intermediate phrase?). But e.g. Phoneme and Tone can’t be queried with respect to each other, and nor can Word and Foot because they aren’t on the same path.
Functions:
serve(): to view and annotate an Emu databaseget_signalCanvasesOrder(): show what signals are being displayed when serve() is launched.set_signalCanvasesOrder(): change the signals to be displayed.An Emu database can be viewed and annotated in at least two ways as follows:
# within the R graphics window
serve(ae)
# in a browser: preferably set your default browser to Chrome
serve(ae, useViewer=F)
Figure 4.1: The ae database.
It is not the purpose of this introduction to give explicit instruction on how to annotate which is covered amply in the manual for the Emu Speech database management system especially section 9.
However, some basic properties can be noted. These include:
This information about the signals being displayed is also given by:
## [1] "OSCI" "SPEC"
OSCI is the waveform and SPEC the spectrogram. These can be changed as follows:
# display only the spectrogram.
set_signalCanvasesOrder(ae, "default", order = "SPEC")
# relaunch `serve()`. Displays only the spectrogram
serve(ae, useViewer = F)
# change it back to how it was:
set_signalCanvasesOrder(ae, "default", order = c("OSCI", "SPEC"))
# relaunch `serve()`. Displays waveform and spectrogram once again.
serve(ae, useViewer = F)Emu will only ever display time tiers with signals (in this case there are two: Phonetic and Tone that are SEGMENT and EVENT tiers respectively). The times tiers to be displayed and their order can be shown and changed as follows:
# which annotation tiers are displays underneath signals?
get_levelCanvasesOrder(ae, "default")
#[1] "Phonetic" "Tone"
# change the display so that Tone is underneath the spectrogram
set_levelCanvasesOrder(ae, "default", c("Tone", "Phonetic"))
# relaunch
serve(ae, useViewer = F)
# display only the `Tone tier`
set_levelCanvasesOrder(ae, "default", "Tone")
serve(ae, useViewer = F)
# change it back to how it was
set_levelCanvasesOrder(ae, "default", c("Phonetic", "Tone"))
serve(ae, useViewer = F)The ITEM annotation tiers can all be seen in the hierarchy view. The four paths identified earlier are visible in clicking on the triangle on the far right (Fig. 4.1). Clicking the triangle can also be used to change to another path. The attribute tiers can be seen by clicking on one of the tier names displayed at the top – e.g. click on Word to show the attribute tiers Text and Accent with which the Word tier is associated.