1 Preliminaries
2 The query language
- 2.1 Simple queries
- 2.2 More on the Emu Query Language EQL

1 Preliminaries

Follow the setup instructions given here, i.e. download R and RStudio, create a directory on your computer where you will store files on this course, make a note of the directory path, create an R project that accesses this directory, and install all indicated packages.

For this and subsequent tutorials, access the tidyverse,magrittr, emuR, and wrassp libraries:

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(magrittr)

## 
## Attaching package: 'magrittr'
## 
## The following object is masked from 'package:purrr':
## 
##     set_names
## 
## The following object is masked from 'package:tidyr':
## 
##     extract

library(emuR)

## 
## Attaching package: 'emuR'
## 
## The following object is masked from 'package:base':
## 
##     norm

library(wrassp)

The following makes use of the demonstration database emuDB that was also used here.

Store and access the demo database as also described here and thus:

create_emuRdemoData(dir = tempdir())
path.ae = file.path(tempdir(), "emuR_demoData", "ae_emuDB")
ae = load_emuDB(path.ae)

## INFO: Loading EMU database from /var/folders/x_/x690j1dj703f09w41vm3hxd80000gp/T//RtmppPXwNJ/emuR_demoData/ae_emuDB... (7 bundles found)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |======================================================================| 100%

summary(ae)

##

## ── Summary of emuDB ────────────────────────────────────────────────────────────

## Name:     ae 
## UUID:     0fc618dc-8980-414d-8c7a-144a649ce199 
## Directory:    /private/var/folders/x_/x690j1dj703f09w41vm3hxd80000gp/T/RtmppPXwNJ/emuR_demoData/ae_emuDB 
## Session count: 1 
## Bundle count: 7 
## Annotation item count:  736 
## Label count:  844 
## Link count:  785

##

## ── Database configuration ──────────────────────────────────────────────────────

##

## ── SSFF track definitions ──

##

##  name columnName fileExtension
##  dft  dft        dft          
##  fm   fm         fms

## ── Level definitions ──

##  name         type    nrOfAttrDefs attrDefNames       
##  Utterance    ITEM    1            Utterance;         
##  Intonational ITEM    1            Intonational;      
##  Intermediate ITEM    1            Intermediate;      
##  Word         ITEM    3            Word; Accent; Text;
##  Syllable     ITEM    1            Syllable;          
##  Phoneme      ITEM    1            Phoneme;           
##  Phonetic     SEGMENT 1            Phonetic;          
##  Tone         EVENT   1            Tone;              
##  Foot         ITEM    1            Foot;

## ── Link definitions ──

##  type         superlevelName sublevelName
##  ONE_TO_MANY  Utterance      Intonational
##  ONE_TO_MANY  Intonational   Intermediate
##  ONE_TO_MANY  Intermediate   Word        
##  ONE_TO_MANY  Word           Syllable    
##  ONE_TO_MANY  Syllable       Phoneme     
##  MANY_TO_MANY Phoneme        Phonetic    
##  ONE_TO_MANY  Syllable       Tone        
##  ONE_TO_MANY  Intonational   Foot        
##  ONE_TO_MANY  Foot           Syllable

The level definitions show an EVENT tier (Tone in which annotations are defined by single points in time), one SEGMENT tier (Phonetic, with start and end times), and several ITEM tiers, e.g. Syllable or Word that inherit times from the Phonetic tier. The link definitions summary shows a rich annotation structure that produces the following tree-like structure for the first utterance (note that only a single path through the hierarchy is shown):

serve(ae)

Figure 1: Hierarchy of the first utterance of the database ae.

2 The query language

2.1 Simple queries

The function for computing queries is called query(); this function needs at least two arguments: the name of the database and the query itself, e.g.

V = query(ae,  "[Phonetic == V]")
V = query(ae,  "Phonetic == V")
V = query(ae,  "Phonetic = V")

The expression ["Phonetic == V"] is a legal expression in the EMU Query Language (EQL) (details see below) and means “which annotations in the Phonetic tier are equal to the label ‘V’” (and “V” is the SAMPA for English equivalent to IPA /ʌ/, i.e. the vowel in words like cut).

2.1.1 Results of `query()`: segment lists

query() has found three tokens of “V” segments:

## # A tibble: 3 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 V       187.  257. 0fc618dc-89… 0000    msajc…           147         147 Phon…
## 2 V       340.  427. 0fc618dc-89… 0000    msajc…           149         149 Phon…
## 3 V      1943. 2037. 0fc618dc-89… 0000    msajc…           189         189 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

As of emuR version 2.0.0 this object of the type tibble with one row per segment descriptor:

Data frame columns

labels: annotations or sequenced annotations of segments concatenated by ‘->’
start: onset time in milliseconds
end: offset time in milliseconds
db_uuid: UUID of emuDB (= a unique identifier)
session: session name
bundle: bundle name (= utterance name)
start_item_id: item ID of first element of sequence
end_item_id: item ID of last element of sequence
level: name of the tier that has been searched
attribute: name of attribute that has been searched
start_item_seq_idx: sequence index of start item
end_item_seq_idx: sequence index of end item
type: type of “segment” row: ITEM: symbolic item, EVENT: event item, SEGMENT: segment
sample_start: start sample position
sample_end: end sample position
sample_rate: sample rate

This makes it easy to access certain informations, e.g.:

# get labels:
V %>% pull(labels)

## [1] "V" "V" "V"

# get start times:
V$start

## [1]  187.425  340.175 1943.175

# or
V %>% pull(start)

## [1]  187.425  340.175 1943.175

# get end times:
V %>% pull(end)

## [1]  256.925  426.675 2037.425

# calculate durations of the [V]s
V$end - V$start

## [1] 69.50 86.50 94.25

# or
V %>% 
  mutate(d = end - start) %>%
  pull(d)

## [1] 69.50 86.50 94.25

V in the above example is a segment list with start and end times because Phonetic is a SEGMENT tier. Event tiers can be queried as well in which case the structure of the tibble data-frame that is returned is exactly the same, except that the end times are all zero (because the annotations mark events in time, rather than segments). For example, here is an event list of all tones in which the end times are all zero.

tones = query(ae, "Tone =~ .*")
tones[1:3,]

## # A tibble: 3 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 H*      419.     0 0fc618dc-89… 0000    msajc…           181         181 Tone 
## 2 H*      932.     0 0fc618dc-89… 0000    msajc…           182         182 Tone 
## 3 L-     1107      0 0fc618dc-89… 0000    msajc…           183         183 Tone 
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

2.1.2 Inherited times

Annotations at ITEM tiers that either have no times or that inherit times can be can queried in the same way:

# Phonetic of type SEGMENT, Phoneme of type ITEM
list_levelDefinitions(ae)

##           name    type nrOfAttrDefs        attrDefNames
## 1    Utterance    ITEM            1          Utterance;
## 2 Intonational    ITEM            1       Intonational;
## 3 Intermediate    ITEM            1       Intermediate;
## 4         Word    ITEM            3 Word; Accent; Text;
## 5     Syllable    ITEM            1           Syllable;
## 6      Phoneme    ITEM            1            Phoneme;
## 7     Phonetic SEGMENT            1           Phonetic;
## 8         Tone   EVENT            1               Tone;
## 9         Foot    ITEM            1               Foot;

V_phoneme = query(ae,  "[Phoneme == V]")
V_phoneme

## # A tibble: 3 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 V       187.  257. 0fc618dc-89… 0000    msajc…           114         114 Phon…
## 2 V       340.  427. 0fc618dc-89… 0000    msajc…           116         116 Phon…
## 3 V      1943. 2037. 0fc618dc-89… 0000    msajc…           149         149 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

## # A tibble: 3 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 V       187.  257. 0fc618dc-89… 0000    msajc…           147         147 Phon…
## 2 V       340.  427. 0fc618dc-89… 0000    msajc…           149         149 Phon…
## 3 V      1943. 2037. 0fc618dc-89… 0000    msajc…           189         189 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

Both V and V_phoneme are associated with times, even though Phoneme is a timeless ITEM tier. The reason for this was explained in an earlier module here.

The calculation of inherited times can be time-consuming and may be switched off with: calcTimes = FALSE:

# Phonetic of type SEGEMNT, Phoneme of type ITEM
list_levelDefinitions(ae)

##           name    type nrOfAttrDefs        attrDefNames
## 1    Utterance    ITEM            1          Utterance;
## 2 Intonational    ITEM            1       Intonational;
## 3 Intermediate    ITEM            1       Intermediate;
## 4         Word    ITEM            3 Word; Accent; Text;
## 5     Syllable    ITEM            1           Syllable;
## 6      Phoneme    ITEM            1            Phoneme;
## 7     Phonetic SEGMENT            1           Phonetic;
## 8         Tone   EVENT            1               Tone;
## 9         Foot    ITEM            1               Foot;

V_phoneme2 = query(ae,
                   "[Phoneme == V]",
                   calcTimes = FALSE)
V_phoneme2

## # A tibble: 3 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 V         NA    NA 0fc618dc-89… 0000    msajc…           114         114 Phon…
## 2 V         NA    NA 0fc618dc-89… 0000    msajc…           116         116 Phon…
## 3 V         NA    NA 0fc618dc-89… 0000    msajc…           149         149 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

In this case, all entries in start and end are NA (== Not Available).

2.1.3 `requery_hier()` and `requery_seq()`

requery_hier() allows segment or event lists to be created for a tier linked to any existing segment list. In the above case, the annotation tier of V_phoneme2 was Phoneme which is linked to many other tiers, as the following shows:

list_linkDefinitions(ae)

##           type superlevelName sublevelName
## 1  ONE_TO_MANY      Utterance Intonational
## 2  ONE_TO_MANY   Intonational Intermediate
## 3  ONE_TO_MANY   Intermediate         Word
## 4  ONE_TO_MANY           Word     Syllable
## 5  ONE_TO_MANY       Syllable      Phoneme
## 6 MANY_TO_MANY        Phoneme     Phonetic
## 7  ONE_TO_MANY       Syllable         Tone
## 8  ONE_TO_MANY   Intonational         Foot
## 9  ONE_TO_MANY           Foot     Syllable

Therefore, to make a segment list of the words corresponding to these segments:

t.s = requery_hier(ae, V_phoneme2, "Text")
t.s

## # A tibble: 3 × 16
##   labels    start   end db_uuid   session bundle start_item_id end_item_id level
##   <chr>     <dbl> <dbl> <chr>     <chr>   <chr>          <int>       <int> <chr>
## 1 amongst    187.  674. 0fc618dc… 0000    msajc…             2           2 Word 
## 2 amongst    187.  674. 0fc618dc… 0000    msajc…             2           2 Word 
## 3 customers 1824. 2368. 0fc618dc… 0000    msajc…            73          73 Word 
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

# or
V_phoneme2 %>% requery_hier(ae, ., "Text")

## # A tibble: 3 × 16
##   labels    start   end db_uuid   session bundle start_item_id end_item_id level
##   <chr>     <dbl> <dbl> <chr>     <chr>   <chr>          <int>       <int> <chr>
## 1 amongst    187.  674. 0fc618dc… 0000    msajc…             2           2 Word 
## 2 amongst    187.  674. 0fc618dc… 0000    msajc…             2           2 Word 
## 3 customers 1824. 2368. 0fc618dc… 0000    msajc…            73          73 Word 
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

The above is a requery looking upstream to a tier than dominates Phoneme. A downstream query makes a segment list of all annotations that are found delimited by ->. Thus for the segment list t.s at the Text tier that has just been created, the corresponding phonemes are:

requery_hier(ae, t.s, "Phoneme")

## # A tibble: 3 × 16
##   labels      start   end db_uuid session bundle start_item_id end_item_id level
##   <chr>       <dbl> <dbl> <chr>   <chr>   <chr>          <int>       <int> <chr>
## 1 V->m->V->N…  187.  674. 0fc618… 0000    msajc…           114         119 Phon…
## 2 V->m->V->N…  187.  674. 0fc618… 0000    msajc…           114         119 Phon…
## 3 k->V->s->t… 1824. 2368. 0fc618… 0000    msajc…           148         155 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

# or
t.s %>% requery_hier(ae, ., "Phoneme")

## # A tibble: 3 × 16
##   labels      start   end db_uuid session bundle start_item_id end_item_id level
##   <chr>       <dbl> <dbl> <chr>   <chr>   <chr>          <int>       <int> <chr>
## 1 V->m->V->N…  187.  674. 0fc618… 0000    msajc…           114         119 Phon…
## 2 V->m->V->N…  187.  674. 0fc618… 0000    msajc…           114         119 Phon…
## 3 k->V->s->t… 1824. 2368. 0fc618… 0000    msajc…           148         155 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

As another example, the following makes a segment list of all words and then requeries these to find out which of them are associate with tones. This is possible because the Word and therefore Text tiers are linked to the Tone tier via the Syllable tier, as list_linkDefinitions(ae) had shown:

all.s = query(ae, "Text =~ .*")
requery_hier(ae, all.s, "Tone")

## Warning in requery_hier(ae, all.s, "Tone"): Found missing items in resulting
## segment list! Replaced missing rows with NA values.

## # A tibble: 54 × 16
##    labels start   end db_uuid     session bundle start_item_id end_item_id level
##    <chr>  <dbl> <dbl> <chr>       <chr>   <chr>          <int>       <int> <chr>
##  1 <NA>     NA     NA <NA>        <NA>    <NA>              NA          NA <NA> 
##  2 H*      419.     0 0fc618dc-8… 0000    msajc…           181         181 Tone 
##  3 H*      932.     0 0fc618dc-8… 0000    msajc…           182         182 Tone 
##  4 <NA>     NA     NA <NA>        <NA>    <NA>              NA          NA <NA> 
##  5 <NA>     NA     NA <NA>        <NA>    <NA>              NA          NA <NA> 
##  6 H*     1913.     0 0fc618dc-8… 0000    msajc…           184         184 Tone 
##  7 H*     2231.     0 0fc618dc-8… 0000    msajc…           185         185 Tone 
##  8 <NA>     NA     NA <NA>        <NA>    <NA>              NA          NA <NA> 
##  9 <NA>     NA     NA <NA>        <NA>    <NA>              NA          NA <NA> 
## 10 H*      761.     0 0fc618dc-8… 0000    msajc…           186         186 Tone 
## # ℹ 44 more rows
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

Notice how many rows are marked NA because they are unassociated with tones (i.e. not pitch-accented).

# find all "V"-labels in `ae`
V = query(ae,
          "[Phonetic == V]")

The function requery_seq() is used for finding annotations and/or making segment lists that precede or follow an existing segment/event list in sequence. In contrast to the requery_hier() function, the segment or event lists returned from requery_seq() are always from the same tier as the segment or event list being requeried. The argument offset = n for any negative or positive integer finds all following anotations if the integer is positive, and all preceding annotations if the integer is negative. Thus to find the annotations that follow those of the segment list V created earlier:

requery_seq(emuDBhandle = ae,
            seglist = V,
            offset = 1)

## # A tibble: 3 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 m       257.  340. 0fc618dc-89… 0000    msajc…           148         148 Phon…
## 2 N       427.  483. 0fc618dc-89… 0000    msajc…           150         150 Phon…
## 3 s      2037. 2085. 0fc618dc-89… 0000    msajc…           190         190 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

To find e.g., those 2 positions to the left, the command is the same but with offset=-2. However, this will fail in this case because there are no annotations 2 positions to the left for some of the annotations in V. To get round this problem, the additional argument ignoreOutOfBounds=TRUE must be included:

requery_seq(emuDBhandle = ae,
            seglist = V,
            offset = -2, ignoreOutOfBounds=TRUE)

## Warning in requery_seq(emuDBhandle = ae, seglist = V, offset = -2,
## ignoreOutOfBounds = TRUE): Found missing items in resulting segment list!
## Replacing missing rows with NA values.

## # A tibble: 3 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 <NA>     NA    NA  <NA>         <NA>    <NA>              NA          NA <NA> 
## 2 V       187.  257. 0fc618dc-89… 0000    msajc…           147         147 Phon…
## 3 k      1824. 1878. 0fc618dc-89… 0000    msajc…           187         187 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <dbl>,
## #   end_item_seq_idx <dbl>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

which gives NA for the first row above, because for this segment, there are no annotations two positions to its left.

A further variation on the function requery_seq() is to include length=n where n is a positive integer: this finds a sequence of n annotations at a given offset position. For example, the following makes a segment list that extends from 1 to 3 annotations to the right relative to the semgent list V:

requery_seq(emuDBhandle = ae,
            seglist = V,
            offset = 1, length=3)

## # A tibble: 3 × 16
##   labels  start   end db_uuid     session bundle start_item_id end_item_id level
##   <chr>   <dbl> <dbl> <chr>       <chr>   <chr>          <int>       <int> <chr>
## 1 m->V->N  257.  483. 0fc618dc-8… 0000    msajc…           148         150 Phon…
## 2 N->s->t  427.  597. 0fc618dc-8… 0000    msajc…           150         152 Phon…
## 3 s->t->H 2037. 2148. 0fc618dc-8… 0000    msajc…           190         192 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

2.2 More on the Emu Query Language `EQL`

To learn more about the functionality of the EQL, see also the manual chapter here.

As discussed in the preceding section, any query must be placed within " ", and any query can be placed within [ ]. The query must include minimally the name of an annotation tier combined with a representation for an annotation (which can also be a regular expression. Further details as follows.

2.2.1 Single argument queries

2.2.1.1 Equality

In the examples above, the equality of the “V” annotations at the Phonetic tier (in the database ae) were tested:

query(emuDBhandle = ae, 
      "Phonetic == V")

## # A tibble: 3 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 V       187.  257. 0fc618dc-89… 0000    msajc…           147         147 Phon…
## 2 V       340.  427. 0fc618dc-89… 0000    msajc…           149         149 Phon…
## 3 V      1943. 2037. 0fc618dc-89… 0000    msajc…           189         189 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

The equality operator is ==. For backward compatibility with earlier versions of emuR a single = is also allowed. Thus the preceding command is equivalent to:

query(emuDBhandle = ae, 
      "Phonetic = V")

## # A tibble: 3 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 V       187.  257. 0fc618dc-89… 0000    msajc…           147         147 Phon…
## 2 V       340.  427. 0fc618dc-89… 0000    msajc…           149         149 Phon…
## 3 V      1943. 2037. 0fc618dc-89… 0000    msajc…           189         189 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

2.2.1.2 Inequality

Searches can be made for everything except V by the use of !=.

query(ae, 
      "Phonetic != V")

## # A tibble: 250 × 16
##    labels start   end db_uuid     session bundle start_item_id end_item_id level
##    <chr>  <dbl> <dbl> <chr>       <chr>   <chr>          <int>       <int> <chr>
##  1 m       257.  340. 0fc618dc-8… 0000    msajc…           148         148 Phon…
##  2 N       427.  483. 0fc618dc-8… 0000    msajc…           150         150 Phon…
##  3 s       483.  567. 0fc618dc-8… 0000    msajc…           151         151 Phon…
##  4 t       567.  597. 0fc618dc-8… 0000    msajc…           152         152 Phon…
##  5 H       597.  674. 0fc618dc-8… 0000    msajc…           153         153 Phon…
##  6 @:      674.  740. 0fc618dc-8… 0000    msajc…           154         154 Phon…
##  7 f       740.  893. 0fc618dc-8… 0000    msajc…           155         155 Phon…
##  8 r       893.  950. 0fc618dc-8… 0000    msajc…           156         156 Phon…
##  9 E       950. 1032. 0fc618dc-8… 0000    msajc…           157         157 Phon…
## 10 n      1032. 1196. 0fc618dc-8… 0000    msajc…           158         158 Phon…
## # ℹ 240 more rows
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

One way to get ‘everything’ is to query something that is probably not in the database like xyz. Alternatively, a regular expression can be used and is preceded by =~. The regular expression for finding all annotations is .* (meaning: any character (.) zero or more times (*)).

everything1 = query(ae, 
                    "Phonetic != xyz")
everything2 = query(ae, 
                    "Phonetic =~ .*")
# should be T if both are equal everywhere
all(everything1 == everything2)

## [1] TRUE

The operator !~ is for negation. An example would be:

# Find all segments at the `Text` tier that don't begin with "a"
query(ae,  "Text !~ a.*")

## # A tibble: 37 × 16
##    labels     start   end db_uuid session bundle start_item_id end_item_id level
##    <chr>      <dbl> <dbl> <chr>   <chr>   <chr>          <int>       <int> <chr>
##  1 her         674.  740. 0fc618… 0000    msajc…            24          24 Word 
##  2 friends     740. 1289. 0fc618… 0000    msajc…            30          30 Word 
##  3 she        1289. 1463. 0fc618… 0000    msajc…            43          43 Word 
##  4 considered 1634. 2150. 0fc618… 0000    msajc…            61          61 Word 
##  5 it          300.  412. 0fc618… 0000    msajc…             2           2 Word 
##  6 is          412.  572. 0fc618… 0000    msajc…            14          14 Word 
##  7 futile      572. 1091. 0fc618… 0000    msajc…            21          21 Word 
##  8 to         1091. 1222. 0fc618… 0000    msajc…            38          38 Word 
##  9 offer      1222. 1391. 0fc618… 0000    msajc…            48          48 Word 
## 10 further    1628. 1958. 0fc618… 0000    msajc…            68          68 Word 
## # ℹ 27 more rows
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

There are therefore four operators, two for equality matching, and two for inequality:

Symbol	Meaning
`==`	equality
`=~`	regular expression matching
`!=`	inequality
`!~`	regular expression non-matching

2.2.1.3 The `OR` operator

The operator | can be used to search for several annotations:

# find all `m` or `n` annotations at the `Phonetic` tier
query(ae, 
      "Phonetic == m | n")

## # A tibble: 19 × 16
##    labels start   end db_uuid     session bundle start_item_id end_item_id level
##    <chr>  <dbl> <dbl> <chr>       <chr>   <chr>          <int>       <int> <chr>
##  1 m       257.  340. 0fc618dc-8… 0000    msajc…           148         148 Phon…
##  2 n      1032. 1196. 0fc618dc-8… 0000    msajc…           158         158 Phon…
##  3 n      1741. 1791. 0fc618dc-8… 0000    msajc…           168         168 Phon…
##  4 n      1515. 1554. 0fc618dc-8… 0000    msajc…           170         170 Phon…
##  5 n      2431. 2528. 0fc618dc-8… 0000    msajc…           184         184 Phon…
##  6 n       895. 1023. 0fc618dc-8… 0000    msajc…           158         158 Phon…
##  7 m      1490. 1565. 0fc618dc-8… 0000    msajc…           169         169 Phon…
##  8 n      2402. 2475. 0fc618dc-8… 0000    msajc…           182         182 Phon…
##  9 m       497.  559. 0fc618dc-8… 0000    msajc…           188         188 Phon…
## 10 n      2227. 2271. 0fc618dc-8… 0000    msajc…           216         216 Phon…
## 11 n      3046. 3068. 0fc618dc-8… 0000    msajc…           229         229 Phon…
## 12 m      1587. 1656. 0fc618dc-8… 0000    msajc…           149         149 Phon…
## 13 m       819.  903. 0fc618dc-8… 0000    msajc…           120         120 Phon…
## 14 n      1435. 1495. 0fc618dc-8… 0000    msajc…           127         127 Phon…
## 15 n      1775. 1834. 0fc618dc-8… 0000    msajc…           132         132 Phon…
## 16 n       509.  544. 0fc618dc-8… 0000    msajc…           166         166 Phon…
## 17 m      1630. 1709. 0fc618dc-8… 0000    msajc…           185         185 Phon…
## 18 m      2173. 2233. 0fc618dc-8… 0000    msajc…           194         194 Phon…
## 19 n      2448. 2480. 0fc618dc-8… 0000    msajc…           199         199 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

# find all `m` or `n` or `N` annotations at the `Phonetic` tier
query(ae, "Phonetic == m | n | N")

## # A tibble: 23 × 16
##    labels start   end db_uuid     session bundle start_item_id end_item_id level
##    <chr>  <dbl> <dbl> <chr>       <chr>   <chr>          <int>       <int> <chr>
##  1 m       257.  340. 0fc618dc-8… 0000    msajc…           148         148 Phon…
##  2 N       427.  483. 0fc618dc-8… 0000    msajc…           150         150 Phon…
##  3 n      1032. 1196. 0fc618dc-8… 0000    msajc…           158         158 Phon…
##  4 n      1741. 1791. 0fc618dc-8… 0000    msajc…           168         168 Phon…
##  5 n      1515. 1554. 0fc618dc-8… 0000    msajc…           170         170 Phon…
##  6 n      2431. 2528. 0fc618dc-8… 0000    msajc…           184         184 Phon…
##  7 n       895. 1023. 0fc618dc-8… 0000    msajc…           158         158 Phon…
##  8 m      1490. 1565. 0fc618dc-8… 0000    msajc…           169         169 Phon…
##  9 n      2402. 2475. 0fc618dc-8… 0000    msajc…           182         182 Phon…
## 10 m       497.  559. 0fc618dc-8… 0000    msajc…           188         188 Phon…
## # ℹ 13 more rows
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

2.2.1.4 Use of features

It is possible in the the Emu system to define features and then to query them. The definition of features is accomplished with the function add/list/remove_attrDefLabelGroups(). For the existing ae database some features have already been defined. To see these for the Phonetic tier:

list_attrDefLabelGroups(ae,
                        levelName = "Phonetic",
                        attributeDefinitionName = "Phonetic")

##          name
## 1       vowel
## 2        stop
## 3       nasal
## 4   fricative
## 5 approximant
## 6       other
##                                                                           values
## 1 A; E; EC; I; O; V; U; ai; ei; oi; i@; u@; au; @u; @:; @; =; a:; e:; i:; o:; u:
## 2                                                       p; tS; dZ; t; k; b; d; g
## 3                                                                           m; n
## 4                                                  f; v; s; z; S; Z; h; D; D-; T
## 5                                                             w; j; l; r; rr; Or
## 6                                                                              H

shows for example that there is a feature nasal that includes the annotations m and n. Consequently, the following give the same output:

# Segment list of nasals
nas.s1 = query(ae, "Phonetic = nasal")
nas.s2 = query(ae, "Phonetic = m|n")
all(nas.s1 == nas.s2)

## [1] TRUE

Use add_attrDefLabelGroup() to add new features. In this example, a feature grave is added to the Phoneme tier by which grave includes the labial and velar consonants:

add_attrDefLabelGroup(
  ae,
  levelName = "Phoneme",
  attributeDefinitionName = "Phoneme",
  labelGroupName = "grave",
  labelGroupValues = c("p", "b", "k", "g")
)

A segment list of the grave annotations is then:

grave.s = query(ae, "Phoneme = grave")
count(grave.s, labels)

## # A tibble: 3 × 2
##   labels     n
##   <chr>  <int>
## 1 b          2
## 2 k          9
## 3 p          3

2.2.2 Sequence queries

Anything except simple queries requires the use of [ ] brackets. Thus whereas in simple queries brackets are optional.

mnN = query( ae, "[Phonetic == m | n | N]")
# or
mnN = query( ae, "Phonetic == m | n | N")

sequence (and hierarchical) queries require [ ] brackets. The -> operator is for finding sequences of annotations:

query(ae, "[Phonetic == V -> Phonetic == m]")

## # A tibble: 1 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 V->m    187.  340. 0fc618dc-89… 0000    msajc…           147         148 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

Note: all row entries in the resulting segment list have the start time of V, the end time of m and their annotations are V->m. This can be changed with the result modifier hash tag # as follows:

# finds V, if V is followed by m
query(ae, "[#Phonetic == V -> Phonetic == m]")

## # A tibble: 1 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 V       187.  257. 0fc618dc-89… 0000    msajc…           147         147 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

# finds m, if m is preceded by V
query(ae, "[Phonetic == V -> #Phonetic == m]")

## # A tibble: 1 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 m       257.  340. 0fc618dc-89… 0000    msajc…           148         148 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

Only one hash tag per query is allowed.

Embedded bracketing is needed to search for multiple sequences. This finds all sequences of @ n s at the Phonetic tier:

query(ae, "[[Phonetic == @ -> Phonetic == n] -> Phonetic == s]")

## # A tibble: 3 × 16
##   labels  start   end db_uuid     session bundle start_item_id end_item_id level
##   <chr>   <dbl> <dbl> <chr>       <chr>   <chr>          <int>       <int> <chr>
## 1 @->n->s 1715. 1893. 0fc618dc-8… 0000    msajc…           167         169 Phon…
## 2 @->n->s 2382. 2754. 0fc618dc-8… 0000    msajc…           183         185 Phon…
## 3 @->n->s 2201. 2409. 0fc618dc-8… 0000    msajc…           215         217 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

The following finds offer followed by any two annotations followed by resistance at the Text tier.

query(ae, "[[[Text == offer -> Text =~ .*] 
      -> Text =~ .* ] -> Text == resistance]")

## # A tibble: 1 × 16
##   labels      start   end db_uuid session bundle start_item_id end_item_id level
##   <chr>       <dbl> <dbl> <chr>   <chr>   <chr>          <int>       <int> <chr>
## 1 offer->any… 1222. 2754. 0fc618… 0000    msajc…            48          80 Word 
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

2.2.3 Domination or linked queries

The operator ^ is for queries spanning two linked tiers. The following find all p annotations at the Phoneme tier in strong syllables (i.e. p annotations at the Phoneme tier dominated by / linked to S annotations at the Syllable tier):

query(ae, "[Phoneme == p ^ Syllable == S]")

## # A tibble: 3 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 p       559.  640. 0fc618dc-89… 0000    msajc…           147         147 Phon…
## 2 p      1656. 1699. 0fc618dc-89… 0000    msajc…           122         122 Phon…
## 3 p       864.  970. 0fc618dc-89… 0000    msajc…           136         136 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

The ^ operator is not directional. Thus, although Syllable dominates Phoneme, the same output is also given by:

query(ae, "[Syllable == S ^ #Phoneme == p]")

## # A tibble: 3 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 p       559.  640. 0fc618dc-89… 0000    msajc…           147         147 Phon…
## 2 p      1656. 1699. 0fc618dc-89… 0000    msajc…           122         122 Phon…
## 3 p       864.  970. 0fc618dc-89… 0000    msajc…           136         136 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

Thus the semantics of ^ is not really “is dominated by”, but rather “is linked to”. However, the preceding requires # which can be omitted by placing the annotations that are required in first position i.e. as query(ae, "[Phoneme == p ^ Syllable == S]").

Brackets are needed for queries spanning several linked tiers:

# Find all Phonetic annotations in strong syllables
# in either `amongst` or `beautiful`
query(ae, 
      "[[Phonetic =~ .* ^ Syllable == S] 
      ^ Text == amongst | beautiful]")

## # A tibble: 9 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 m       257.  340. 0fc618dc-89… 0000    msajc…           148         148 Phon…
## 2 V       340.  427. 0fc618dc-89… 0000    msajc…           149         149 Phon…
## 3 N       427.  483. 0fc618dc-89… 0000    msajc…           150         150 Phon…
## 4 s       483.  567. 0fc618dc-89… 0000    msajc…           151         151 Phon…
## 5 t       567.  597. 0fc618dc-89… 0000    msajc…           152         152 Phon…
## 6 H       597.  674. 0fc618dc-89… 0000    msajc…           153         153 Phon…
## 7 db     2034. 2150. 0fc618dc-89… 0000    msajc…           173         173 Phon…
## 8 j      2150. 2211. 0fc618dc-89… 0000    msajc…           174         174 Phon…
## 9 u:     2211. 2284. 0fc618dc-89… 0000    msajc…           175         175 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

2.2.4 Conjunction queries

The & operator is used to for annotations of a tier than is an attribute of another tier.For example, Text and Accent are evidently attributes of the Word tier, as shown by:

list_attributeDefinitions(ae, level = "Word")

##     name level   type hasLabelGroups hasLegalLabels
## 1   Word  Word STRING          FALSE          FALSE
## 2 Accent  Word STRING          FALSE          FALSE
## 3   Text  Word STRING          FALSE          FALSE

Thus to find all accented (S) words:

query(ae, "[Text =~.* & Accent == S]")

## # A tibble: 25 × 16
##    labels     start   end db_uuid session bundle start_item_id end_item_id level
##    <chr>      <dbl> <dbl> <chr>   <chr>   <chr>          <int>       <int> <chr>
##  1 amongst     187.  674. 0fc618… 0000    msajc…             2           2 Word 
##  2 friends     740. 1289. 0fc618… 0000    msajc…            30          30 Word 
##  3 beautiful  2034. 2604. 0fc618… 0000    msajc…            83          83 Word 
##  4 futile      572. 1091. 0fc618… 0000    msajc…            21          21 Word 
##  5 further    1628. 1958. 0fc618… 0000    msajc…            68          68 Word 
##  6 resistance 1958. 2754. 0fc618… 0000    msajc…            80          80 Word 
##  7 chill       380.  745. 0fc618… 0000    msajc…            13          13 Word 
##  8 wind        745. 1083. 0fc618… 0000    msajc…            23          23 Word 
##  9 caused     1083. 1456. 0fc618… 0000    msajc…            36          36 Word 
## 10 shiver     1651. 1995. 0fc618… 0000    msajc…            70          70 Word 
## # ℹ 15 more rows
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

To find all unaccented function words:

query(ae, "[Text =~.* & Accent == W & Word == F]")

## # A tibble: 20 × 16
##    labels start   end db_uuid     session bundle start_item_id end_item_id level
##    <chr>  <dbl> <dbl> <chr>       <chr>   <chr>          <int>       <int> <chr>
##  1 her     674.  740. 0fc618dc-8… 0000    msajc…            24          24 Word 
##  2 she    1289. 1463. 0fc618dc-8… 0000    msajc…            43          43 Word 
##  3 was    1463. 1634. 0fc618dc-8… 0000    msajc…            52          52 Word 
##  4 it      300.  412. 0fc618dc-8… 0000    msajc…             2           2 Word 
##  5 is      412.  572. 0fc618dc-8… 0000    msajc…            14          14 Word 
##  6 to     1091. 1222. 0fc618dc-8… 0000    msajc…            38          38 Word 
##  7 any    1437. 1628. 0fc618dc-8… 0000    msajc…            58          58 Word 
##  8 the     300.  380. 0fc618dc-8… 0000    msajc…             2           2 Word 
##  9 them   1456. 1565. 0fc618dc-8… 0000    msajc…            51          51 Word 
## 10 to     1565. 1651. 0fc618dc-8… 0000    msajc…            60          60 Word 
## 11 he      300.  425. 0fc618dc-8… 0000    msajc…             2           2 Word 
## 12 his    1129. 1368. 0fc618dc-8… 0000    msajc…            37          37 Word 
## 13 his    2694. 2781. 0fc618dc-8… 0000    msajc…           101         101 Word 
## 14 are     662.  775. 0fc618dc-8… 0000    msajc…            19          19 Word 
## 15 to     1806. 1890. 0fc618dc-8… 0000    msajc…            71          71 Word 
## 16 I'll    300.  514. 0fc618dc-8… 0000    msajc…             2           2 Word 
## 17 my      819. 1039. 0fc618dc-8… 0000    msajc…            23          23 Word 
## 18 and    1422. 1495. 0fc618dc-8… 0000    msajc…            43          43 Word 
## 19 this    300.  476. 0fc618dc-8… 0000    msajc…             2           2 Word 
## 20 than   2368. 2480. 0fc618dc-8… 0000    msajc…            97          97 Word 
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

To find unaccented function words but only if they follow a content word:

query(ae, "[Word = C -> #Text =~.* & Accent == W & Word == F]")

## # A tibble: 12 × 16
##    labels start   end db_uuid     session bundle start_item_id end_item_id level
##    <chr>  <dbl> <dbl> <chr>       <chr>   <chr>          <int>       <int> <chr>
##  1 her     674.  740. 0fc618dc-8… 0000    msajc…            24          24 Word 
##  2 she    1289. 1463. 0fc618dc-8… 0000    msajc…            43          43 Word 
##  3 to     1091. 1222. 0fc618dc-8… 0000    msajc…            38          38 Word 
##  4 any    1437. 1628. 0fc618dc-8… 0000    msajc…            58          58 Word 
##  5 them   1456. 1565. 0fc618dc-8… 0000    msajc…            51          51 Word 
##  6 his    1129. 1368. 0fc618dc-8… 0000    msajc…            37          37 Word 
##  7 his    2694. 2781. 0fc618dc-8… 0000    msajc…           101         101 Word 
##  8 are     662.  775. 0fc618dc-8… 0000    msajc…            19          19 Word 
##  9 to     1806. 1890. 0fc618dc-8… 0000    msajc…            71          71 Word 
## 10 my      819. 1039. 0fc618dc-8… 0000    msajc…            23          23 Word 
## 11 and    1422. 1495. 0fc618dc-8… 0000    msajc…            43          43 Word 
## 12 than   2368. 2480. 0fc618dc-8… 0000    msajc…            97          97 Word 
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

2.2.5 Position queries

There are three position functions, Start(X,Y), Medial(X,Y), and End(X,Y). In all position queries, the annotations are returned from Y in F(X, Y) where Xand Y are the two tiers that form part of a position query. Thus the following finds all annotations at the Phoneme tier that are initial relative to annotations at the Word tier (i.e., it finds all word-initial phonemes).

query(ae, "[Start(Word, Phoneme) == T]")

## # A tibble: 54 × 16
##    labels start   end db_uuid     session bundle start_item_id end_item_id level
##    <chr>  <dbl> <dbl> <chr>       <chr>   <chr>          <int>       <int> <chr>
##  1 V       187.  257. 0fc618dc-8… 0000    msajc…           114         114 Phon…
##  2 @:      674.  740. 0fc618dc-8… 0000    msajc…           120         120 Phon…
##  3 f       740.  893. 0fc618dc-8… 0000    msajc…           121         121 Phon…
##  4 S      1289. 1420. 0fc618dc-8… 0000    msajc…           126         126 Phon…
##  5 w      1463. 1506. 0fc618dc-8… 0000    msajc…           128         128 Phon…
##  6 k      1634. 1715. 0fc618dc-8… 0000    msajc…           131         131 Phon…
##  7 b      2034. 2150. 0fc618dc-8… 0000    msajc…           139         139 Phon…
##  8 I       300.  373. 0fc618dc-8… 0000    msajc…           119         119 Phon…
##  9 I       412.  476. 0fc618dc-8… 0000    msajc…           121         121 Phon…
## 10 f       572.  674. 0fc618dc-8… 0000    msajc…           123         123 Phon…
## # ℹ 44 more rows
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

Word-initial and word-medial phonemes could then be obtained by finding all phonemes that are not word-final:

# Word-initial and word-medial phonemes
query(ae, "[End(Word, Phoneme) == F]")

## # A tibble: 167 × 16
##    labels start   end db_uuid     session bundle start_item_id end_item_id level
##    <chr>  <dbl> <dbl> <chr>       <chr>   <chr>          <int>       <int> <chr>
##  1 V       187.  257. 0fc618dc-8… 0000    msajc…           114         114 Phon…
##  2 m       257.  340. 0fc618dc-8… 0000    msajc…           115         115 Phon…
##  3 V       340.  427. 0fc618dc-8… 0000    msajc…           116         116 Phon…
##  4 N       427.  483. 0fc618dc-8… 0000    msajc…           117         117 Phon…
##  5 s       483.  567. 0fc618dc-8… 0000    msajc…           118         118 Phon…
##  6 f       740.  893. 0fc618dc-8… 0000    msajc…           121         121 Phon…
##  7 r       893.  950. 0fc618dc-8… 0000    msajc…           122         122 Phon…
##  8 E       950. 1032. 0fc618dc-8… 0000    msajc…           123         123 Phon…
##  9 n      1032. 1196. 0fc618dc-8… 0000    msajc…           124         124 Phon…
## 10 S      1289. 1420. 0fc618dc-8… 0000    msajc…           126         126 Phon…
## # ℹ 157 more rows
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

To find all f and S phonemes that are word-initial:

query(ae, "[Phoneme = f | S  & Start(Word, Phoneme) == T]")

## # A tibble: 5 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 f       740.  893. 0fc618dc-89… 0000    msajc…           121         121 Phon…
## 2 S      1289. 1420. 0fc618dc-89… 0000    msajc…           126         126 Phon…
## 3 f       572.  674. 0fc618dc-89… 0000    msajc…           123         123 Phon…
## 4 f      1628. 1741. 0fc618dc-89… 0000    msajc…           138         138 Phon…
## 5 S      1651. 1801. 0fc618dc-89… 0000    msajc…           137         137 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

To find all phonemes in intonationally phrase-final syllables is a domination/linked query. This is because (i) Y is Syllable in the second part of the query and (ii) Syllable dominates Phoneme. Because Syllable dominates Phoneme the two pieces of the query must be linked by ^:

query(ae, "[Phoneme =~ .* 
      ^ End(Intonational, Syllable) = T]")

## # A tibble: 25 × 16
##    labels start   end db_uuid     session bundle start_item_id end_item_id level
##    <chr>  <dbl> <dbl> <chr>       <chr>   <chr>          <int>       <int> <chr>
##  1 f      2362. 2447. 0fc618dc-8… 0000    msajc…           144         144 Phon…
##  2 @      2447. 2506. 0fc618dc-8… 0000    msajc…           145         145 Phon…
##  3 l      2506. 2604. 0fc618dc-8… 0000    msajc…           146         146 Phon…
##  4 s      2228. 2319. 0fc618dc-8… 0000    msajc…           146         146 Phon…
##  5 t      2319. 2382. 0fc618dc-8… 0000    msajc…           147         147 Phon…
##  6 @      2382. 2431. 0fc618dc-8… 0000    msajc…           148         148 Phon…
##  7 n      2431. 2528. 0fc618dc-8… 0000    msajc…           149         149 Phon…
##  8 s      2528. 2754. 0fc618dc-8… 0000    msajc…           150         150 Phon…
##  9 l      2534. 2569. 0fc618dc-8… 0000    msajc…           148         148 Phon…
## 10 i:     2569. 2692. 0fc618dc-8… 0000    msajc…           149         149 Phon…
## # ℹ 15 more rows
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

To find all phonemes in intonationally phrase-final weak syllables is:

query(ae, "[Phoneme =~ .* 
      ^ Syllable = W 
      & End(Intonational, Syllable) = T]")

## # A tibble: 15 × 16
##    labels start   end db_uuid     session bundle start_item_id end_item_id level
##    <chr>  <dbl> <dbl> <chr>       <chr>   <chr>          <int>       <int> <chr>
##  1 f      2362. 2447. 0fc618dc-8… 0000    msajc…           144         144 Phon…
##  2 @      2447. 2506. 0fc618dc-8… 0000    msajc…           145         145 Phon…
##  3 l      2506. 2604. 0fc618dc-8… 0000    msajc…           146         146 Phon…
##  4 s      2228. 2319. 0fc618dc-8… 0000    msajc…           146         146 Phon…
##  5 t      2319. 2382. 0fc618dc-8… 0000    msajc…           147         147 Phon…
##  6 @      2382. 2431. 0fc618dc-8… 0000    msajc…           148         148 Phon…
##  7 n      2431. 2528. 0fc618dc-8… 0000    msajc…           149         149 Phon…
##  8 s      2528. 2754. 0fc618dc-8… 0000    msajc…           150         150 Phon…
##  9 l      2534. 2569. 0fc618dc-8… 0000    msajc…           148         148 Phon…
## 10 i:     2569. 2692. 0fc618dc-8… 0000    msajc…           149         149 Phon…
## 11 s      3123. 3239. 0fc618dc-8… 0000    msajc…           182         182 Phon…
## 12 @      3239. 3298. 0fc618dc-8… 0000    msajc…           183         183 Phon…
## 13 z      3298. 3457. 0fc618dc-8… 0000    msajc…           184         184 Phon…
## 14 v      2588. 2646. 0fc618dc-8… 0000    msajc…           160         160 Phon…
## 15 @      2646. 2795. 0fc618dc-8… 0000    msajc…           161         161 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

2.2.6 Count queries

These involve a comparison with a number (by using one of ==, !=, >, >=, <, <=: see below). The count function Num(X,Y) which counts the number of annotations at tier Y relative to tier X. Notice that whereas position queries return annotations at tier Y, count queries return annotations at tier X. Thus the following finds all bisyllabic words and then all words made of four or more syllables:

# Find all words that contain two syllables
query(ae, "[Num(Text, Syllable) == 2]")

## # A tibble: 11 × 16
##    labels   start   end db_uuid   session bundle start_item_id end_item_id level
##    <chr>    <dbl> <dbl> <chr>     <chr>   <chr>          <int>       <int> <chr>
##  1 amongst   187.  674. 0fc618dc… 0000    msajc…             2           2 Word 
##  2 futile    572. 1091. 0fc618dc… 0000    msajc…            21          21 Word 
##  3 any      1437. 1628. 0fc618dc… 0000    msajc…            58          58 Word 
##  4 further  1628. 1958. 0fc618dc… 0000    msajc…            68          68 Word 
##  5 shiver   1651. 1995. 0fc618dc… 0000    msajc…            70          70 Word 
##  6 itches    300.  662. 0fc618dc… 0000    msajc…             2           2 Word 
##  7 always    775. 1280. 0fc618dc… 0000    msajc…            28          28 Word 
##  8 tempting 1401. 1806. 0fc618dc… 0000    msajc…            51          51 Word 
##  9 display   667. 1211. 0fc618dc… 0000    msajc…            25          25 Word 
## 10 attracts 1211. 1579. 0fc618dc… 0000    msajc…            44          44 Word 
## 11 ever     2480. 2795. 0fc618dc… 0000    msajc…           106         106 Word 
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

# Find all syllables that contain more than four phonemes
query(ae, "[Num(Syllable, Phoneme) > 4]")

## # A tibble: 7 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 S       257.  674. 0fc618dc-89… 0000    msajc…           103         103 Syll…
## 2 S       740. 1289. 0fc618dc-89… 0000    msajc…           105         105 Syll…
## 3 W      2228. 2754. 0fc618dc-89… 0000    msajc…           118         118 Syll…
## 4 S      1213. 1797. 0fc618dc-89… 0000    msajc…           134         134 Syll…
## 5 S      1890. 2470. 0fc618dc-89… 0000    msajc…           105         105 Syll…
## 6 S      1964. 2554. 0fc618dc-89… 0000    msajc…            90          90 Syll…
## 7 S      1248. 1579. 0fc618dc-89… 0000    msajc…           119         119 Syll…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

To get all syllables in bisyllabic words requires:

query(ae, "[Syllable =~ .* ^ Num(Text, Syllable) == 2]")

## # A tibble: 22 × 16
##    labels start   end db_uuid     session bundle start_item_id end_item_id level
##    <chr>  <dbl> <dbl> <chr>       <chr>   <chr>          <int>       <int> <chr>
##  1 W       187.  257. 0fc618dc-8… 0000    msajc…           102         102 Syll…
##  2 S       257.  674. 0fc618dc-8… 0000    msajc…           103         103 Syll…
##  3 S       572.  798. 0fc618dc-8… 0000    msajc…           107         107 Syll…
##  4 S       798. 1091. 0fc618dc-8… 0000    msajc…           108         108 Syll…
##  5 S      1437. 1515. 0fc618dc-8… 0000    msajc…           112         112 Syll…
##  6 W      1515. 1628. 0fc618dc-8… 0000    msajc…           113         113 Syll…
##  7 S      1628. 1864. 0fc618dc-8… 0000    msajc…           114         114 Syll…
##  8 W      1864. 1958. 0fc618dc-8… 0000    msajc…           115         115 Syll…
##  9 S      1651. 1863. 0fc618dc-8… 0000    msajc…           113         113 Syll…
## 10 W      1863. 1995. 0fc618dc-8… 0000    msajc…           114         114 Syll…
## # ℹ 12 more rows
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

Notice that the above query has to be linked by ^. This is because as stated earlier Num(X, Y) returns annotations from tier X which is in this case Text which dominates (is linked to) Syllable. Similarly, to find all Intermediate phrases that contain words of 4 or more syllables:

query(ae, "[Intermediate =~ .* ^ Num(Text, Syllable) >= 4]")

## # A tibble: 1 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 L-     1565. 2692. 0fc618dc-89… 0000    msajc…            63          63 Inte…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

Again the two bits of the query must be joined with ^ because Intermediate dominates Text. To get all words of 4 or more syllables in L- Intermediate phrases requires putting the Num(X,Y) function as the first part of the query argument:

query(ae, "[Num(Text, Syllable) >= 4 ^ Intermediate = L-]")

## # A tibble: 1 × 16
##   labels    start   end db_uuid   session bundle start_item_id end_item_id level
##   <chr>     <dbl> <dbl> <chr>     <chr>   <chr>          <int>       <int> <chr>
## 1 violently 1995. 2692. 0fc618dc… 0000    msajc…            82          82 Word 
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

As a general guide, it’s helpful to build up more complex queries that involve e.g. both number, position and multiple tiers in stages. For example:

find the final syllable of trisyllabic words that have a word-final /s/.

The above would match circumstance (if it’s in the database) because the word is trisyllabic and ends in /s/.

# Find all three syllable words
query(ae, "Num(Text, Syllable)=3")

## # A tibble: 7 × 16
##   labels     start   end db_uuid  session bundle start_item_id end_item_id level
##   <chr>      <dbl> <dbl> <chr>    <chr>   <chr>          <int>       <int> <chr>
## 1 considered 1634. 2150. 0fc618d… 0000    msajc…            61          61 Word 
## 2 beautiful  2034. 2604. 0fc618d… 0000    msajc…            83          83 Word 
## 3 resistance 1958. 2754. 0fc618d… 0000    msajc…            80          80 Word 
## 4 emphasized  425. 1129. 0fc618d… 0000    msajc…            13          13 Word 
## 5 concealing 2104. 2694. 0fc618d… 0000    msajc…            78          78 Word 
## 6 weaknesses 2781. 3457. 0fc618d… 0000    msajc…           109         109 Word 
## 7 customers  1824. 2368. 0fc618d… 0000    msajc…            73          73 Word 
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

# Find all three syllable words that end in /s/.
query(ae, 
      "[Num(Text, Syllable)=3 ^ 
      Phoneme=s & End(Text, Phoneme)=T]")

## # A tibble: 1 × 16
##   labels     start   end db_uuid  session bundle start_item_id end_item_id level
##   <chr>      <dbl> <dbl> <chr>    <chr>   <chr>          <int>       <int> <chr>
## 1 resistance 1958. 2754. 0fc618d… 0000    msajc…            80          80 Word 
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

Note again the need for ^ in the above. This is because the first part of the query returns Text annotations and the second part returns Phoneme annotations and because the Text tier dominates (is linked to) the Phoneme tier. To get at the Phoneme annotations in the above, modify the above with #

# Find all word-final /s/ in trisyllabic words.
query(ae, 
      "[Num(Text, Syllable)=3 ^ 
      #Phoneme=s & End(Text, Phoneme)=T]")

## # A tibble: 1 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 s      2528. 2754. 0fc618dc-89… 0000    msajc…           150         150 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

To get the syllable that dominates an /s/ that is word-final:

# Syllables dominating  word-final /s/
query(ae,  
      "[Phoneme=s & End(Text, Phoneme)=T ^ 
      #Syllable =~ .*]")

## # A tibble: 6 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 W      2228. 2754. 0fc618dc-89… 0000    msajc…           118         118 Syll…
## 2 S      1213. 1797. 0fc618dc-89… 0000    msajc…           134         134 Syll…
## 3 S      1039. 1422. 0fc618dc-89… 0000    msajc…            86          86 Syll…
## 4 S      1964. 2554. 0fc618dc-89… 0000    msajc…            90          90 Syll…
## 5 W       300.  476. 0fc618dc-89… 0000    msajc…           114         114 Syll…
## 6 S      1248. 1579. 0fc618dc-89… 0000    msajc…           119         119 Syll…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

or equivalently without #:

# Syllables dominating  word-final /s/
query(ae,  "[Syllable =~ .* ^ 
      Phoneme=s & End(Text, Phoneme)=T]")

## # A tibble: 6 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 W      2228. 2754. 0fc618dc-89… 0000    msajc…           118         118 Syll…
## 2 S      1213. 1797. 0fc618dc-89… 0000    msajc…           134         134 Syll…
## 3 S      1039. 1422. 0fc618dc-89… 0000    msajc…            86          86 Syll…
## 4 S      1964. 2554. 0fc618dc-89… 0000    msajc…            90          90 Syll…
## 5 W       300.  476. 0fc618dc-89… 0000    msajc…           114         114 Syll…
## 6 S      1248. 1579. 0fc618dc-89… 0000    msajc…           119         119 Syll…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

Now just modify the command for obtaining all word-final /s/ in trisyllabic words with the above command, thus:

# the final syllable of trisyllabic words 
# that end in /s/.
query(ae, 
      "[Num(Text, Syllable)=3 ^ 
      [Phoneme=s & End(Text, Phoneme)=T ^ 
      #Syllable =~ .*]]")

## # A tibble: 1 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 W      2228. 2754. 0fc618dc-89… 0000    msajc…           118         118 Syll…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

or equivalently:

# the final syllable of trisyllabic words 
# that end in /s/.
query(ae,  "[[Syllable =~ .* ^ 
      Phoneme=s & End(Text, Phoneme)=T] ^ 
      Num(Text, Syllable)=3]")

## # A tibble: 1 × 16
##   labels start   end db_uuid      session bundle start_item_id end_item_id level
##   <chr>  <dbl> <dbl> <chr>        <chr>   <chr>          <int>       <int> <chr>
## 1 W      2228. 2754. 0fc618dc-89… 0000    msajc…           118         118 Syll…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## #   end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## #   sample_rate <int>

The Emu query language

Jonathan Harrington, Raphael Winkelmann

WiSe 2123

1 Preliminaries

2 The query language

2.1 Simple queries

2.1.1 Results of `query()`: segment lists

2.1.2 Inherited times

2.1.3 `requery_hier()` and `requery_seq()`

2.2 More on the Emu Query Language `EQL`

2.2.1 Single argument queries

2.2.1.1 Equality

2.2.1.2 Inequality

2.2.1.3 The `OR` operator

2.2.1.4 Use of features

2.2.2 Sequence queries

2.2.3 Domination or linked queries

2.2.4 Conjunction queries

2.2.5 Position queries

2.2.6 Count queries

The Emu query language

Jonathan Harrington, Raphael Winkelmann

WiSe 2123

1 Preliminaries

2 The query language

2.1 Simple queries

2.1.1 Results of query(): segment lists

2.1.2 Inherited times

2.1.3 requery_hier() and requery_seq()

2.2 More on the Emu Query Language EQL

2.2.1 Single argument queries

2.2.1.1 Equality

2.2.1.2 Inequality

2.2.1.3 The OR operator

2.2.1.4 Use of features

2.2.2 Sequence queries

2.2.3 Domination or linked queries

2.2.4 Conjunction queries

2.2.5 Position queries

2.2.6 Count queries

2.1.1 Results of `query()`: segment lists

2.1.3 `requery_hier()` and `requery_seq()`

2.2 More on the Emu Query Language `EQL`

2.2.1.3 The `OR` operator