--- title: "First steps in using the Emu Speech Database Management System" author: "Jonathan Harrington" date: "WiSe 2021" output: bookdown::html_document2: number_sections: TRUE toc: true theme: flatly highlight: pygments --- # Preliminaries and loading libraries Follow the setup instructions given [here](https://www.phonetik.uni-muenchen.de/~jmh/lehre/basic_r/_book/index.html), i.e. download R and RStudio, create a directory on your computer where you will store files on this course, make a note of the directory path, create an R project that accesses this directory, and install all indicated packages. For this and subsequent tutorials, access the `tidyverse`,`magrittr`, `emuR`, and `wrassp` libraries: ```{r} library(tidyverse) library(magrittr) library(emuR) library(wrassp) ``` # Accessing an existing Emu speech database Functions: * `create_emuRdemoData()` for viewing an online demo database * `file.path()`: An R function for defining the location of a directory on your system The following command downloads and stores a demonstration Emu databases in a temporary directory defined by the `tempdir()` function: ```{r} create_emuRdemoData(dir = tempdir()) ``` Emu databases that have been created end in `_emuDB`. The Emu database is physically stored inside the directory `emuR_demoData` (which is inside the directory created by `tempdir()`). The path to this emuDB is given by: ```{r} file.path(tempdir(), "emuR_demoData", "ae_emuDB") ``` In order to see the files that are physically stored in the `ae_emuDB`, first save the pathname and then use `list.files()`: ```{r} path.ae = file.path(tempdir(), "emuR_demoData", "ae_emuDB") # what are the files? list.files(path.ae) ``` The file `ae_DBconfig.json` is a template that stores information about the defining properties of the `ae` database (see next section). The utterances (in this case) are all stored in the directory `0000_ses`. To look inside this directory: ```{r} list.files(file.path(path.ae, "0000_ses")) ``` which shows that there are 7 so called **bundles**. These are directories and Emu organises things such that all the files that belong to the same utterance are always in the same bundle. The **utterance name** precedes `_`. Thus in this case, it is clear from the output above that there are 7 utterances whose names are `msajc003`, `msajc010`, `msajc012`, `msajc015`, `msajc023`, `msajc057`. To see what files there are for any utterance e.g. for `msajc003`: ```{r} list.files(file.path(path.ae, "0000_ses", "msajc003_bndl")) ``` shows that `msajc003` has the following files: * `_annot.json`: this stores information about the annotations. * `.dft`: this is a derived signal files of DFT data obtained from the speech waveform. * `.fms`: this is formant data also obtained from the speech waveform. * `.wav`: this is the waveform itself. In order to access the `ae` emuDB in R, the above path needs to be stored and passed to the function `load_emuDB()` that reads the database into R: ```{r} # store the above path name path2ae = file.path(tempdir(), "emuR_demoData", "ae_emuDB") # read it into R. Here the emuDB has been stored as `ae` (Australian English) ae = load_emuDB(path2ae) ``` # Some defining properties of an Emu database Functions: * `summary()` lists the salient contents of an emuDB * `list_bundles()`: lists the so-called bundles of an emuDB * `list_levelDefinitions()`: lists the annotation levels of tiers of an emuDB * `list_attributeDefinitions()`: lists the attribute tiers of an annotation tier * `list_linkDefinitions()`: lists the links between the tiers of an emuDB * `list_ssffTrackDefinitions()`: lists the available signal files of an emuDB The following summarises the salient attributes of the emuDB that has just been stored: ```{r} summary(ae) ``` The `directory` shows the path where the directory is located. The `bundle count` is important: it shows how many utterances there are in the database as well as their utterance names. This information is also given by: ```{r} list_bundles(ae) ``` which confirms what was seen earlier: there are 7 utterances. The `── SSFF track definitions ──` show which signals are currently available in the database apart from waveform files. For this database there are signals of type `dft` (discrete Fourier transform) and of type `fm` (formant). They have extensions `.dft` and `.fms`. This information is also given by: ```{r} list_ssffTrackDefinitions(ae) ``` The `── Level definitions ──` show the available tiers or annotation levels of the database. This information is also given by: ```{r} list_levelDefinitions(ae) ``` As the above shows, annotation tiers can be of three types: `ITEM`, `SEGMENT`, and `EVENT`. The annotations of `ITEM` tiers **inherit their times** from the (typically) `SEGMENT` tiers that they dominate. In a `SEGMENT` tier, each annotation has **a start and end time**. In an `EVENT` tier, each annotation **is defined by a single point in time**. A tier can also be associated with one or more `ATTRIBUTE` tiers. This information is also given by the function `list_attributeDefinitions()` with the database name as the first argument, and the tier to be queried for attributes as the second. For example: ```{r} list_attributeDefinitions(ae, "Word") ``` shows that the tiers `Accent` and `Text` are attributes of `Word`. The annotations of an attribute tier **always have identical times to those of the main tier with which they are associated** (thus the annotations of the `Accent` tier have identical start and end times to those of the `Word`) tier. An attribute tier is often used to provide additional information about annotations. In the `ae` database, the `Word` tier consists of annotations consisting entirely of `C` (content word) and `F` (function word). The annotations at the `Text` tier are used to provide the orthography for each content or function word annotation; and the annotations of the `Accent` tier are used to mark whether or not a word is prosodically accented or not. The information in `── Link definitions ──` of `summary(ae)` shows how the tiers are associated with each other. This information is also provided by the function `list_linkDefinitions()`: ```{r} list_linkDefinitions(ae) ``` In the Emu system, annotation tiers can be (but need not be) hierarchically organised with respect to each other. The point of doing so is broadly two-fold. The first is to allow annotations to inherit times if these are predictable. In Fig. \@ref(fig:seen) for example, the start and end times of the annotation `seen` are completely predictable from the annotations at the `Phonetic` tier that it dominates. Compatibly, the Emu system provides a way for the start and end times of `seen` to be inherited from the annotations at the hierarchically lower `SEGMENT` tier `Phonetic` that it dominates. ```{r seen, out.width="50%", fig.align="center", fig.cap = "Left: `Text` is an item tier, `Phonetic` is a segment tier as shown by `(S)`. `Text` dominates `Phonetic` as shown by the vertical downward arrow. The duration of [s] is $t_2 - t_1$, of [i], $t_3 - t_2$, and of [n], $t_4 - t_3$. Because `Text` is an `ITEM` tier that dominates `Phonetic` which is a `SEGMENT` tier, annotations at the `Text` tier inherit their times from `Phonetic`. Consequently, the duration of `seen` is $t_4 - t_1$. `Text` and `Phonetic` stand in a `ONE-TO-MANY` association (as signified by the downward arrow) because an annotation at the `Text` tier can be associated with one or more annotations at the `Phonetic` tier, but not vice versa. Right: `Phoneme` and `Phonetic` stand in a `MANY-TO-MANY` relationship (as shown by the double arrow) because an annotation at the `Phoneme` tier can map to more than one annotations at the `Phonetic` tier and vice versa. In this hypothetical example of an annotation of the second syllable of a word like *region*, the single affricate annotation /dZ/ at the `Phoneme` tier maps to a sequence of [d] and [Z] annotations at the `Phonetic` tier, while the single annotation of the syllabic [n] at the `Phonetic` tier maps to a sequence of annotations /@n/ at the `Phoneme` tier. Note that /@/ and /n/ inherit **the same start and end times** and therefore have the same duration of $t_4 - t_3$ i.e. they overlap with each other in time completely.", echo=FALSE} knitr::include_graphics("./img/seen.png") ``` The second is to be able to query the database in order to obtain annotations at one tier with respect to another (e.g., all orthographic annotations of the vowels in the database; all `H*` pitch accents in an intermediate phrase, etc.). Without this linkage, then these types of queries would not be possible. Emu allows quite a flexible configuration of annotation tiers. The type of configuration can be defined by the user and will depend on the types of information that the user wants to be able to extract from the database. The configuration of annotation tiers for the currently loaded `ae` database is shown in Fig. \@ref(fig:tree). ```{r tree, out.width="25%", fig.align="center", fig.cap="The links between the annotation tiers of the `ae` database. `ITEM` tiers are unmarked, `SEGMENT` tiers are marked with `(S)` and `EVENT` tiers with `(E)`. `ATTRIBUTE` tiers have no arrow between them (thus `Text` and `Accent` are attribute tiers of `Word`). A downward arrow signifies domination in a one-to-many relationship; a double arrow signifies domination in a many-to-many relationship.", echo=FALSE} knitr::include_graphics("./img/tree.png") ``` Inherited times percolate up through the tree from time tiers i.e. from `SEGMENT` and `EVENT` tiers upwards through `ITEM` tiers. Thus, `Phoneme` is an item tier which inherits its times from the `SEGMENT` tier `Phonetic`. `Word` inherits its times from `Syllable` which inherits its times from `Phoneme` (and therefore from `Phonetic`) and so on all the way up to the top tier `Utterance`. Sometimes, tiers can inherit more than one set of times. In \@ref(fig:tree), `Syllable` inherits times both from `Phonetic (S)` and from `Tone (E)`. For the same reason, all the tiers that dominate `Syllable` (including `Foot`) inherit these two sets of times. Any two annotation tiers **on the same path** can be queried with respect to each other, where a path is defined as tiers connected by arrows. There are in fact four paths in the configuration: 1. `Utterance -> Intonational -> Intermediate -> Word -> Syllable -> Phoneme <-> Phonetic (S)` 2. `Utterance -> Intonational -> Foot -> Syllable -> Phoneme <-> Phonetic (S)` 3. `Utterance -> Intonational -> Syllable -> Tone (E)` 4. `Utterance -> Intonational -> Intermediate -> Word -> Syllable -> Tone (E)` From (1-4), it becomes clear that e.g. annotations of the `Syllable` tier can be queried with respect to `Tone` (which syllables contain an `H*` tone?) and vice versa (are any `H*` tones in weak syllables?); or annotations at the `Intermediate` tier can be queried with respect to `Word` (how many words are there in an `L-` intermediate phrase?) and vice-versa (which words are in an `L-` intermediate phrase?). But e.g. `Phoneme` and `Tone` can't be queried with respect to each other, and nor can `Word` and `Foot` because they aren't on the same path. # Viewing and annotating an Emu database Functions: * `serve()`: to view and annotate an Emu database * `get_signalCanvasesOrder()`: show what signals are being displayed when `serve()` is launched. * `set_signalCanvasesOrder()`: change the signals to be displayed. An Emu database can be viewed and annotated in at least two ways as follows: ```{r, eval=FALSE} # within the R graphics window serve(ae) # in a browser: preferably set your default browser to Chrome serve(ae, useViewer=F) ``` ```{r figae, fig.align="center", fig.cap="The `ae` database.", echo=FALSE, out.width = "75%"} knitr::include_graphics("./img/figae.png") ``` It is not the purpose of this introduction to give explicit instruction on how to annotate which is covered amply in the [manual for the Emu Speech database management system](https://ips-lmu.github.io/The-EMU-SDMS-Manual/installing-the-emu-sdms.html) especially [section 9](https://ips-lmu.github.io/The-EMU-SDMS-Manual/chap-emu-webApp.html). However, some basic properties can be noted. These include: * there are, as mentioned before, 7 utterances. * the database displays two types of signals. These are the waveform and the spectrogram below it. This information about the signals being displayed is also given by: ```{r} get_signalCanvasesOrder(ae, perspectiveName = "default") ``` `OSCI` is the waveform and `SPEC` the spectrogram. These can be changed as follows: ```{r, eval=FALSE} # display only the spectrogram. set_signalCanvasesOrder(ae, "default", order = "SPEC") # relaunch `serve()`. Displays only the spectrogram serve(ae, useViewer = F) # change it back to how it was: set_signalCanvasesOrder(ae, "default", order = c("OSCI", "SPEC")) # relaunch `serve()`. Displays waveform and spectrogram once again. serve(ae, useViewer = F) ``` Emu will only ever display time tiers with signals (in this case there are two: `Phonetic` and `Tone` that are `SEGMENT` and `EVENT` tiers respectively). The times tiers to be displayed and their order can be shown and changed as follows: ```{r, eval=FALSE} # which annotation tiers are displays underneath signals? get_levelCanvasesOrder(ae, "default") #[1] "Phonetic" "Tone" # change the display so that Tone is underneath the spectrogram set_levelCanvasesOrder(ae, "default", c("Tone", "Phonetic")) # relaunch serve(ae, useViewer = F) # display only the `Tone tier` set_levelCanvasesOrder(ae, "default", "Tone") serve(ae, useViewer = F) # change it back to how it was set_levelCanvasesOrder(ae, "default", c("Phonetic", "Tone")) serve(ae, useViewer = F) ``` The `ITEM` annotation tiers can all be seen in the hierarchy view. The four paths identified earlier are visible in clicking on the triangle on the far right (Fig. \@ref(fig:figae)). Clicking the triangle can also be used to change to another path. The attribute tiers can be seen by clicking on one of the tier names displayed at the top -- e.g. click on `Word` to show the attribute tiers `Text` and `Accent` with which the `Word` tier is associated.