1 Preliminaries and loading libraries

Follow the setup instructions given here, i.e. download R and RStudio, create a directory on your computer where you will store files on this course, make a note of the directory path, create an R project that accesses this directory, and install all indicated packages.

For this and subsequent tutorials, access the tidyverse,magrittr, emuR, and wrassp libraries:

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(magrittr)

## 
## Attaching package: 'magrittr'
## 
## The following object is masked from 'package:purrr':
## 
##     set_names
## 
## The following object is masked from 'package:tidyr':
## 
##     extract

library(emuR)

## 
## Attaching package: 'emuR'
## 
## The following object is masked from 'package:base':
## 
##     norm

library(wrassp)

2 Accessing an existing Emu speech database

Functions:

create_emuRdemoData() for viewing an online demo database
file.path(): An R function for defining the location of a directory on your system

The following command downloads and stores a demonstration Emu databases in a temporary directory defined by the tempdir() function:

create_emuRdemoData(dir = tempdir())

Emu databases that have been created end in _emuDB. The Emu database is physically stored inside the directory emuR_demoData (which is inside the directory created by tempdir()). The path to this emuDB is given by:

file.path(tempdir(), "emuR_demoData", "ae_emuDB")

## [1] "/var/folders/x_/x690j1dj703f09w41vm3hxd80000gp/T//Rtmp7hCpVF/emuR_demoData/ae_emuDB"

In order to see the files that are physically stored in the ae_emuDB, first save the pathname and then use list.files():

path.ae = file.path(tempdir(), "emuR_demoData", "ae_emuDB")
# what are the files?
list.files(path.ae)

## [1] "0000_ses"         "ae_DBconfig.json"

The file ae_DBconfig.json is a template that stores information about the defining properties of the ae database (see next section). The utterances (in this case) are all stored in the directory 0000_ses. To look inside this directory:

list.files(file.path(path.ae, "0000_ses"))

## [1] "msajc003_bndl" "msajc010_bndl" "msajc012_bndl" "msajc015_bndl"
## [5] "msajc022_bndl" "msajc023_bndl" "msajc057_bndl"

which shows that there are 7 so called bundles. These are directories and Emu organises things such that all the files that belong to the same utterance are always in the same bundle. The utterance name precedes _. Thus in this case, it is clear from the output above that there are 7 utterances whose names are msajc003, msajc010, msajc012, msajc015, msajc023, msajc057. To see what files there are for any utterance e.g. for msajc003:

list.files(file.path(path.ae, "0000_ses", "msajc003_bndl"))

## [1] "msajc003_annot.json" "msajc003.dft"        "msajc003.fms"       
## [4] "msajc003.wav"

shows that msajc003 has the following files:

_annot.json: this stores information about the annotations.
.dft: this is a derived signal files of DFT data obtained from the speech waveform.
.fms: this is formant data also obtained from the speech waveform.
.wav: this is the waveform itself.

In order to access the ae emuDB in R, the above path needs to be stored and passed to the function load_emuDB() that reads the database into R:

# store the above path name
path2ae = file.path(tempdir(), "emuR_demoData", "ae_emuDB")
# read it into R. Here the emuDB has been stored as `ae` (Australian English)
ae = load_emuDB(path2ae)

## INFO: Loading EMU database from /var/folders/x_/x690j1dj703f09w41vm3hxd80000gp/T//Rtmp7hCpVF/emuR_demoData/ae_emuDB... (7 bundles found)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |======================================================================| 100%

3 Some defining properties of an Emu database

Functions:

summary() lists the salient contents of an emuDB
list_bundles(): lists the so-called bundles of an emuDB
list_levelDefinitions(): lists the annotation levels of tiers of an emuDB
list_attributeDefinitions(): lists the attribute tiers of an annotation tier
list_linkDefinitions(): lists the links between the tiers of an emuDB
list_ssffTrackDefinitions(): lists the available signal files of an emuDB

The following summarises the salient attributes of the emuDB that has just been stored:

summary(ae)

##

## ── Summary of emuDB ────────────────────────────────────────────────────────────

## Name:     ae 
## UUID:     0fc618dc-8980-414d-8c7a-144a649ce199 
## Directory:    /private/var/folders/x_/x690j1dj703f09w41vm3hxd80000gp/T/Rtmp7hCpVF/emuR_demoData/ae_emuDB 
## Session count: 1 
## Bundle count: 7 
## Annotation item count:  736 
## Label count:  844 
## Link count:  785

##

## ── Database configuration ──────────────────────────────────────────────────────

##

## ── SSFF track definitions ──

##

##  name columnName fileExtension
##  dft  dft        dft          
##  fm   fm         fms

## ── Level definitions ──

##  name         type    nrOfAttrDefs attrDefNames       
##  Utterance    ITEM    1            Utterance;         
##  Intonational ITEM    1            Intonational;      
##  Intermediate ITEM    1            Intermediate;      
##  Word         ITEM    3            Word; Accent; Text;
##  Syllable     ITEM    1            Syllable;          
##  Phoneme      ITEM    1            Phoneme;           
##  Phonetic     SEGMENT 1            Phonetic;          
##  Tone         EVENT   1            Tone;              
##  Foot         ITEM    1            Foot;

## ── Link definitions ──

##  type         superlevelName sublevelName
##  ONE_TO_MANY  Utterance      Intonational
##  ONE_TO_MANY  Intonational   Intermediate
##  ONE_TO_MANY  Intermediate   Word        
##  ONE_TO_MANY  Word           Syllable    
##  ONE_TO_MANY  Syllable       Phoneme     
##  MANY_TO_MANY Phoneme        Phonetic    
##  ONE_TO_MANY  Syllable       Tone        
##  ONE_TO_MANY  Intonational   Foot        
##  ONE_TO_MANY  Foot           Syllable

The directory shows the path where the directory is located.

The bundle count is important: it shows how many utterances there are in the database as well as their utterance names. This information is also given by:

list_bundles(ae)

## # A tibble: 7 × 2
##   session name    
##   <chr>   <chr>   
## 1 0000    msajc003
## 2 0000    msajc010
## 3 0000    msajc012
## 4 0000    msajc015
## 5 0000    msajc022
## 6 0000    msajc023
## 7 0000    msajc057

which confirms what was seen earlier: there are 7 utterances.

The ── SSFF track definitions ── show which signals are currently available in the database apart from waveform files. For this database there are signals of type dft (discrete Fourier transform) and of type fm (formant). They have extensions .dft and .fms. This information is also given by:

list_ssffTrackDefinitions(ae)

##   name columnName fileExtension
## 1  dft        dft           dft
## 2   fm         fm           fms

The ── Level definitions ── show the available tiers or annotation levels of the database. This information is also given by:

list_levelDefinitions(ae)

##           name    type nrOfAttrDefs        attrDefNames
## 1    Utterance    ITEM            1          Utterance;
## 2 Intonational    ITEM            1       Intonational;
## 3 Intermediate    ITEM            1       Intermediate;
## 4         Word    ITEM            3 Word; Accent; Text;
## 5     Syllable    ITEM            1           Syllable;
## 6      Phoneme    ITEM            1            Phoneme;
## 7     Phonetic SEGMENT            1           Phonetic;
## 8         Tone   EVENT            1               Tone;
## 9         Foot    ITEM            1               Foot;

As the above shows, annotation tiers can be of three types: ITEM, SEGMENT, and EVENT. The annotations of ITEM tiers inherit their times from the (typically) SEGMENT tiers that they dominate. In a SEGMENT tier, each annotation has a start and end time. In an EVENT tier, each annotation is defined by a single point in time. A tier can also be associated with one or more ATTRIBUTE tiers. This information is also given by the function list_attributeDefinitions() with the database name as the first argument, and the tier to be queried for attributes as the second. For example:

list_attributeDefinitions(ae, "Word")

##     name level   type hasLabelGroups hasLegalLabels
## 1   Word  Word STRING          FALSE          FALSE
## 2 Accent  Word STRING          FALSE          FALSE
## 3   Text  Word STRING          FALSE          FALSE

shows that the tiers Accent and Text are attributes of Word. The annotations of an attribute tier always have identical times to those of the main tier with which they are associated (thus the annotations of the Accent tier have identical start and end times to those of the Word) tier. An attribute tier is often used to provide additional information about annotations. In the ae database, the Word tier consists of annotations consisting entirely of C (content word) and F (function word). The annotations at the Text tier are used to provide the orthography for each content or function word annotation; and the annotations of the Accent tier are used to mark whether or not a word is prosodically accented or not.

The information in ── Link definitions ── of summary(ae) shows how the tiers are associated with each other. This information is also provided by the function list_linkDefinitions():

list_linkDefinitions(ae)

##           type superlevelName sublevelName
## 1  ONE_TO_MANY      Utterance Intonational
## 2  ONE_TO_MANY   Intonational Intermediate
## 3  ONE_TO_MANY   Intermediate         Word
## 4  ONE_TO_MANY           Word     Syllable
## 5  ONE_TO_MANY       Syllable      Phoneme
## 6 MANY_TO_MANY        Phoneme     Phonetic
## 7  ONE_TO_MANY       Syllable         Tone
## 8  ONE_TO_MANY   Intonational         Foot
## 9  ONE_TO_MANY           Foot     Syllable

In the Emu system, annotation tiers can be (but need not be) hierarchically organised with respect to each other. The point of doing so is broadly two-fold. The first is to allow annotations to inherit times if these are predictable. In Fig. 3.1 for example, the start and end times of the annotation seen are completely predictable from the annotations at the Phonetic tier that it dominates. Compatibly, the Emu system provides a way for the start and end times of seen to be inherited from the annotations at the hierarchically lower SEGMENT tier Phonetic that it dominates.

Left: `Text` is an item tier, `Phonetic` is a segment tier as shown by `(S)`. `Text` dominates `Phonetic` as shown by the vertical downward arrow. The duration of [s] is $t_2 - t_1$, of [i], $t_3 - t_2$, and of [n], $t_4 - t_3$. Because `Text` is an `ITEM` tier that dominates `Phonetic` which is a `SEGMENT` tier, annotations at the `Text` tier inherit their times from `Phonetic`. Consequently, the duration of `seen` is $t_4 - t_1$. `Text` and `Phonetic` stand in a `ONE-TO-MANY` association (as signified by the downward arrow) because an annotation at the `Text` tier can be associated with one or more annotations at the `Phonetic` tier, but not vice versa. Right: `Phoneme` and `Phonetic` stand in a `MANY-TO-MANY` relationship (as shown by the double arrow) because an annotation at the `Phoneme` tier can map to more than one annotations at the `Phonetic` tier and vice versa. In this hypothetical example of an annotation of the second syllable of a word like *region*, the single affricate annotation /dZ/ at the `Phoneme` tier maps to a sequence of [d] and [Z] annotations at the `Phonetic` tier, while the single annotation of the syllabic [n] at the `Phonetic` tier maps to a sequence of annotations /@n/ at the `Phoneme` tier. Note that /@/ and /n/ inherit **the same start and end times** and therefore have the same duration of $t_4 - t_3$ i.e. they overlap with each other in time completely.

Figure 3.1: Left: Text is an item tier, Phonetic is a segment tier as shown by (S). Text dominates Phonetic as shown by the vertical downward arrow. The duration of [s] is $t_2 - t_1$, of [i], $t_3 - t_2$, and of [n], $t_4 - t_3$. Because Text is an ITEM tier that dominates Phonetic which is a SEGMENT tier, annotations at the Text tier inherit their times from Phonetic. Consequently, the duration of seen is $t_4 - t_1$. Text and Phonetic stand in a ONE-TO-MANY association (as signified by the downward arrow) because an annotation at the Text tier can be associated with one or more annotations at the Phonetic tier, but not vice versa. Right: Phoneme and Phonetic stand in a MANY-TO-MANY relationship (as shown by the double arrow) because an annotation at the Phoneme tier can map to more than one annotations at the Phonetic tier and vice versa. In this hypothetical example of an annotation of the second syllable of a word like region, the single affricate annotation /dZ/ at the Phoneme tier maps to a sequence of [d] and [Z] annotations at the Phonetic tier, while the single annotation of the syllabic [n] at the Phonetic tier maps to a sequence of annotations /@n/ at the Phoneme tier. Note that /@/ and /n/ inherit the same start and end times and therefore have the same duration of $t_4 - t_3$ i.e. they overlap with each other in time completely.

The second is to be able to query the database in order to obtain annotations at one tier with respect to another (e.g., all orthographic annotations of the vowels in the database; all H* pitch accents in an intermediate phrase, etc.). Without this linkage, then these types of queries would not be possible.

Emu allows quite a flexible configuration of annotation tiers. The type of configuration can be defined by the user and will depend on the types of information that the user wants to be able to extract from the database. The configuration of annotation tiers for the currently loaded ae database is shown in Fig. 3.2.

The links between the annotation tiers of the `ae` database. `ITEM` tiers are unmarked, `SEGMENT` tiers are marked with `(S)` and `EVENT` tiers with `(E)`. `ATTRIBUTE` tiers have no arrow between them (thus `Text` and `Accent` are attribute tiers of `Word`). A downward arrow signifies domination in a one-to-many relationship; a double arrow signifies domination in a many-to-many relationship.

Figure 3.2: The links between the annotation tiers of the ae database. ITEM tiers are unmarked, SEGMENT tiers are marked with (S) and EVENT tiers with (E). ATTRIBUTE tiers have no arrow between them (thus Text and Accent are attribute tiers of Word). A downward arrow signifies domination in a one-to-many relationship; a double arrow signifies domination in a many-to-many relationship.

Inherited times percolate up through the tree from time tiers i.e. from SEGMENT and EVENT tiers upwards through ITEM tiers. Thus, Phoneme is an item tier which inherits its times from the SEGMENT tier Phonetic. Word inherits its times from Syllable which inherits its times from Phoneme (and therefore from Phonetic) and so on all the way up to the top tier Utterance. Sometimes, tiers can inherit more than one set of times. In 3.2, Syllable inherits times both from Phonetic (S) and from Tone (E). For the same reason, all the tiers that dominate Syllable (including Foot) inherit these two sets of times.

Any two annotation tiers on the same path can be queried with respect to each other, where a path is defined as tiers connected by arrows. There are in fact four paths in the configuration:

Utterance -> Intonational -> Intermediate -> Word -> Syllable -> Phoneme <-> Phonetic (S)
Utterance -> Intonational -> Foot -> Syllable -> Phoneme <-> Phonetic (S)
Utterance -> Intonational -> Syllable -> Tone (E)
Utterance -> Intonational -> Intermediate -> Word -> Syllable -> Tone (E)

From (1-4), it becomes clear that e.g. annotations of the Syllable tier can be queried with respect to Tone (which syllables contain an H* tone?) and vice versa (are any H* tones in weak syllables?); or annotations at the Intermediate tier can be queried with respect to Word (how many words are there in an L- intermediate phrase?) and vice-versa (which words are in an L- intermediate phrase?). But e.g. Phoneme and Tone can’t be queried with respect to each other, and nor can Word and Foot because they aren’t on the same path.

4 Viewing and annotating an Emu database

Functions:

serve(): to view and annotate an Emu database
get_signalCanvasesOrder(): show what signals are being displayed when serve() is launched.
set_signalCanvasesOrder(): change the signals to be displayed.

An Emu database can be viewed and annotated in at least two ways as follows:

# within the R graphics window
serve(ae)
# in a browser: preferably set your default browser to Chrome
serve(ae, useViewer=F)

$The `ae` database.$

Figure 4.1: The ae database.

It is not the purpose of this introduction to give explicit instruction on how to annotate which is covered amply in the manual for the Emu Speech database management system especially section 9.

However, some basic properties can be noted. These include:

there are, as mentioned before, 7 utterances.
the database displays two types of signals. These are the waveform and the spectrogram below it.

This information about the signals being displayed is also given by:

get_signalCanvasesOrder(ae, perspectiveName = "default")

## [1] "OSCI" "SPEC"

OSCI is the waveform and SPEC the spectrogram. These can be changed as follows:

# display only the spectrogram. 
set_signalCanvasesOrder(ae, "default", order = "SPEC")
# relaunch `serve()`. Displays only the spectrogram
serve(ae, useViewer = F)
# change it back to how it was:
set_signalCanvasesOrder(ae, "default", order = c("OSCI", "SPEC"))
# relaunch `serve()`. Displays waveform and spectrogram once again.
serve(ae, useViewer = F)

Emu will only ever display time tiers with signals (in this case there are two: Phonetic and Tone that are SEGMENT and EVENT tiers respectively). The times tiers to be displayed and their order can be shown and changed as follows:

# which annotation tiers are displays underneath signals?
get_levelCanvasesOrder(ae, "default")
#[1] "Phonetic" "Tone" 
# change the display so that Tone is underneath the spectrogram
set_levelCanvasesOrder(ae, "default", c("Tone", "Phonetic"))
# relaunch
serve(ae, useViewer = F)
# display only the `Tone tier`
set_levelCanvasesOrder(ae, "default", "Tone")
serve(ae, useViewer = F)
# change it back to how it was
set_levelCanvasesOrder(ae, "default", c("Phonetic", "Tone"))
serve(ae, useViewer = F)

The ITEM annotation tiers can all be seen in the hierarchy view. The four paths identified earlier are visible in clicking on the triangle on the far right (Fig. 4.1). Clicking the triangle can also be used to change to another path. The attribute tiers can be seen by clicking on one of the tier names displayed at the top – e.g. click on Word to show the attribute tiers Text and Accent with which the Word tier is associated.

First steps in using the Emu Speech Database Management System

Jonathan Harrington

WiSe 2021

1 Preliminaries and loading libraries

2 Accessing an existing Emu speech database

3 Some defining properties of an Emu database

4 Viewing and annotating an Emu database