Follow the setup instructions given here, i.e. download R and RStudio, create a directory on your computer where you will store files on this course, make a note of the directory path, create an R project that accesses this directory, and install all indicated packages.
For this and subsequent tutorials, access the tidyverse
,magrittr
, emuR
, and wrassp
libraries:
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'magrittr'
##
## The following object is masked from 'package:purrr':
##
## set_names
##
## The following object is masked from 'package:tidyr':
##
## extract
##
## Attaching package: 'emuR'
##
## The following object is masked from 'package:base':
##
## norm
The following makes use of the demonstration database emuDB
that was also used here.
Store and access the demo database as also described here and thus:
create_emuRdemoData(dir = tempdir())
path.ae = file.path(tempdir(), "emuR_demoData", "ae_emuDB")
ae = load_emuDB(path.ae)
## INFO: Loading EMU database from /var/folders/x_/x690j1dj703f09w41vm3hxd80000gp/T//RtmppPXwNJ/emuR_demoData/ae_emuDB... (7 bundles found)
##
|
| | 0%
|
|========== | 14%
|
|==================== | 29%
|
|============================== | 43%
|
|======================================== | 57%
|
|================================================== | 71%
|
|============================================================ | 86%
|
|======================================================================| 100%
##
## ── Summary of emuDB ────────────────────────────────────────────────────────────
## Name: ae
## UUID: 0fc618dc-8980-414d-8c7a-144a649ce199
## Directory: /private/var/folders/x_/x690j1dj703f09w41vm3hxd80000gp/T/RtmppPXwNJ/emuR_demoData/ae_emuDB
## Session count: 1
## Bundle count: 7
## Annotation item count: 736
## Label count: 844
## Link count: 785
##
## ── Database configuration ──────────────────────────────────────────────────────
##
## ── SSFF track definitions ──
##
## name columnName fileExtension
## dft dft dft
## fm fm fms
## ── Level definitions ──
## name type nrOfAttrDefs attrDefNames
## Utterance ITEM 1 Utterance;
## Intonational ITEM 1 Intonational;
## Intermediate ITEM 1 Intermediate;
## Word ITEM 3 Word; Accent; Text;
## Syllable ITEM 1 Syllable;
## Phoneme ITEM 1 Phoneme;
## Phonetic SEGMENT 1 Phonetic;
## Tone EVENT 1 Tone;
## Foot ITEM 1 Foot;
## ── Link definitions ──
## type superlevelName sublevelName
## ONE_TO_MANY Utterance Intonational
## ONE_TO_MANY Intonational Intermediate
## ONE_TO_MANY Intermediate Word
## ONE_TO_MANY Word Syllable
## ONE_TO_MANY Syllable Phoneme
## MANY_TO_MANY Phoneme Phonetic
## ONE_TO_MANY Syllable Tone
## ONE_TO_MANY Intonational Foot
## ONE_TO_MANY Foot Syllable
The level definitions show an EVENT
tier (Tone
in which annotations are defined by single points in time), one SEGMENT
tier (Phonetic
, with start and end times), and several ITEM
tiers, e.g. Syllable
or Word
that inherit times from the Phonetic
tier. The link definitions summary shows a rich annotation structure that produces the following tree-like structure for the first utterance (note that only a single path through the hierarchy is shown):
The function for computing queries is called query()
; this function needs at least two arguments: the name of the database and the query itself, e.g.
The expression ["Phonetic == V"]
is a legal expression in the EMU Query Language (EQL)
(details see below) and means “which annotations in the Phonetic tier are equal to the label ‘V’” (and “V” is the SAMPA for English equivalent to IPA /ʌ/, i.e. the vowel in words like cut).
query()
: segment listsquery()
has found three tokens of “V” segments:
## # A tibble: 3 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 V 187. 257. 0fc618dc-89… 0000 msajc… 147 147 Phon…
## 2 V 340. 427. 0fc618dc-89… 0000 msajc… 149 149 Phon…
## 3 V 1943. 2037. 0fc618dc-89… 0000 msajc… 189 189 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
As of emuR version 2.0.0 this object of the type tibble with one row per segment descriptor:
Data frame columns
labels: annotations or sequenced annotations of segments concatenated by ‘->’
start: onset time in milliseconds
end: offset time in milliseconds
db_uuid: UUID of emuDB (= a unique identifier)
session: session name
bundle: bundle name (= utterance name)
start_item_id: item ID of first element of sequence
end_item_id: item ID of last element of sequence
level: name of the tier that has been searched
attribute: name of attribute that has been searched
start_item_seq_idx: sequence index of start item
end_item_seq_idx: sequence index of end item
type: type of “segment” row: ITEM
: symbolic item, EVENT
: event item, SEGMENT
: segment
sample_start: start sample position
sample_end: end sample position
sample_rate: sample rate
This makes it easy to access certain informations, e.g.:
## [1] "V" "V" "V"
## [1] 187.425 340.175 1943.175
## [1] 187.425 340.175 1943.175
## [1] 256.925 426.675 2037.425
## [1] 69.50 86.50 94.25
## [1] 69.50 86.50 94.25
V
in the above example is a segment list with start and end times because Phonetic
is a SEGMENT
tier. Event tiers can be queried as well in which case the structure of the tibble data-frame that is returned is exactly the same, except that the end times are all zero (because the annotations mark events in time, rather than segments). For example, here is an event list of all tones in which the end times are all zero.
## # A tibble: 3 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 H* 419. 0 0fc618dc-89… 0000 msajc… 181 181 Tone
## 2 H* 932. 0 0fc618dc-89… 0000 msajc… 182 182 Tone
## 3 L- 1107 0 0fc618dc-89… 0000 msajc… 183 183 Tone
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
Annotations at ITEM
tiers that either have no times or that inherit times can be can queried in the same way:
## name type nrOfAttrDefs attrDefNames
## 1 Utterance ITEM 1 Utterance;
## 2 Intonational ITEM 1 Intonational;
## 3 Intermediate ITEM 1 Intermediate;
## 4 Word ITEM 3 Word; Accent; Text;
## 5 Syllable ITEM 1 Syllable;
## 6 Phoneme ITEM 1 Phoneme;
## 7 Phonetic SEGMENT 1 Phonetic;
## 8 Tone EVENT 1 Tone;
## 9 Foot ITEM 1 Foot;
## # A tibble: 3 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 V 187. 257. 0fc618dc-89… 0000 msajc… 114 114 Phon…
## 2 V 340. 427. 0fc618dc-89… 0000 msajc… 116 116 Phon…
## 3 V 1943. 2037. 0fc618dc-89… 0000 msajc… 149 149 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
## # A tibble: 3 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 V 187. 257. 0fc618dc-89… 0000 msajc… 147 147 Phon…
## 2 V 340. 427. 0fc618dc-89… 0000 msajc… 149 149 Phon…
## 3 V 1943. 2037. 0fc618dc-89… 0000 msajc… 189 189 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
Both V
and V_phoneme
are associated with times, even though Phoneme
is a timeless ITEM
tier. The reason for this was explained in an earlier module here.
The calculation of inherited times can be time-consuming and may be switched off with: calcTimes = FALSE
:
## name type nrOfAttrDefs attrDefNames
## 1 Utterance ITEM 1 Utterance;
## 2 Intonational ITEM 1 Intonational;
## 3 Intermediate ITEM 1 Intermediate;
## 4 Word ITEM 3 Word; Accent; Text;
## 5 Syllable ITEM 1 Syllable;
## 6 Phoneme ITEM 1 Phoneme;
## 7 Phonetic SEGMENT 1 Phonetic;
## 8 Tone EVENT 1 Tone;
## 9 Foot ITEM 1 Foot;
## # A tibble: 3 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 V NA NA 0fc618dc-89… 0000 msajc… 114 114 Phon…
## 2 V NA NA 0fc618dc-89… 0000 msajc… 116 116 Phon…
## 3 V NA NA 0fc618dc-89… 0000 msajc… 149 149 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
In this case, all entries in start
and end
are NA
(== N
ot A
vailable).
requery_hier()
and requery_seq()
requery_hier()
allows segment or event lists to be created for a tier linked to any existing segment list. In the above case, the annotation tier of V_phoneme2
was Phoneme
which is linked to many other tiers, as the following shows:
## type superlevelName sublevelName
## 1 ONE_TO_MANY Utterance Intonational
## 2 ONE_TO_MANY Intonational Intermediate
## 3 ONE_TO_MANY Intermediate Word
## 4 ONE_TO_MANY Word Syllable
## 5 ONE_TO_MANY Syllable Phoneme
## 6 MANY_TO_MANY Phoneme Phonetic
## 7 ONE_TO_MANY Syllable Tone
## 8 ONE_TO_MANY Intonational Foot
## 9 ONE_TO_MANY Foot Syllable
Therefore, to make a segment list of the words corresponding to these segments:
## # A tibble: 3 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 amongst 187. 674. 0fc618dc… 0000 msajc… 2 2 Word
## 2 amongst 187. 674. 0fc618dc… 0000 msajc… 2 2 Word
## 3 customers 1824. 2368. 0fc618dc… 0000 msajc… 73 73 Word
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
## # A tibble: 3 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 amongst 187. 674. 0fc618dc… 0000 msajc… 2 2 Word
## 2 amongst 187. 674. 0fc618dc… 0000 msajc… 2 2 Word
## 3 customers 1824. 2368. 0fc618dc… 0000 msajc… 73 73 Word
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
The above is a requery looking upstream to a tier than dominates Phoneme
. A downstream query makes a segment list of all annotations that are found delimited by ->
. Thus for the segment list t.s
at the Text
tier that has just been created, the corresponding phonemes are:
## # A tibble: 3 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 V->m->V->N… 187. 674. 0fc618… 0000 msajc… 114 119 Phon…
## 2 V->m->V->N… 187. 674. 0fc618… 0000 msajc… 114 119 Phon…
## 3 k->V->s->t… 1824. 2368. 0fc618… 0000 msajc… 148 155 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
## # A tibble: 3 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 V->m->V->N… 187. 674. 0fc618… 0000 msajc… 114 119 Phon…
## 2 V->m->V->N… 187. 674. 0fc618… 0000 msajc… 114 119 Phon…
## 3 k->V->s->t… 1824. 2368. 0fc618… 0000 msajc… 148 155 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
As another example, the following makes a segment list of all words and then requeries these to find out which of them are associate with tones. This is possible because the Word
and therefore Text
tiers are linked to the Tone
tier via the Syllable
tier, as list_linkDefinitions(ae)
had shown:
## Warning in requery_hier(ae, all.s, "Tone"): Found missing items in resulting
## segment list! Replaced missing rows with NA values.
## # A tibble: 54 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 <NA> NA NA <NA> <NA> <NA> NA NA <NA>
## 2 H* 419. 0 0fc618dc-8… 0000 msajc… 181 181 Tone
## 3 H* 932. 0 0fc618dc-8… 0000 msajc… 182 182 Tone
## 4 <NA> NA NA <NA> <NA> <NA> NA NA <NA>
## 5 <NA> NA NA <NA> <NA> <NA> NA NA <NA>
## 6 H* 1913. 0 0fc618dc-8… 0000 msajc… 184 184 Tone
## 7 H* 2231. 0 0fc618dc-8… 0000 msajc… 185 185 Tone
## 8 <NA> NA NA <NA> <NA> <NA> NA NA <NA>
## 9 <NA> NA NA <NA> <NA> <NA> NA NA <NA>
## 10 H* 761. 0 0fc618dc-8… 0000 msajc… 186 186 Tone
## # ℹ 44 more rows
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
Notice how many rows are marked NA
because they are unassociated with tones (i.e. not pitch-accented).
The function requery_seq()
is used for finding annotations and/or making segment lists that precede or follow an existing segment/event list in sequence. In contrast to the requery_hier()
function, the segment or event lists returned from requery_seq()
are always from the same tier as the segment or event list being requeried. The argument offset = n
for any negative or positive integer finds all following anotations if the integer is positive, and all preceding annotations if the integer is negative. Thus to find the annotations that follow those of the segment list V
created earlier:
## # A tibble: 3 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 m 257. 340. 0fc618dc-89… 0000 msajc… 148 148 Phon…
## 2 N 427. 483. 0fc618dc-89… 0000 msajc… 150 150 Phon…
## 3 s 2037. 2085. 0fc618dc-89… 0000 msajc… 190 190 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
To find e.g., those 2 positions to the left, the command is the same but with offset=-2
. However, this will fail in this case because there are no annotations 2 positions to the left for some of the annotations in V
. To get round this problem, the additional argument ignoreOutOfBounds=TRUE
must be included:
## Warning in requery_seq(emuDBhandle = ae, seglist = V, offset = -2,
## ignoreOutOfBounds = TRUE): Found missing items in resulting segment list!
## Replacing missing rows with NA values.
## # A tibble: 3 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 <NA> NA NA <NA> <NA> <NA> NA NA <NA>
## 2 V 187. 257. 0fc618dc-89… 0000 msajc… 147 147 Phon…
## 3 k 1824. 1878. 0fc618dc-89… 0000 msajc… 187 187 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <dbl>,
## # end_item_seq_idx <dbl>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
which gives NA
for the first row above, because for this segment, there are no annotations two positions to its left.
A further variation on the function requery_seq()
is to include length=n
where n
is a positive integer: this finds a sequence of n
annotations at a given offset
position. For example, the following makes a segment list that extends from 1 to 3 annotations to the right relative to the semgent list V
:
## # A tibble: 3 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 m->V->N 257. 483. 0fc618dc-8… 0000 msajc… 148 150 Phon…
## 2 N->s->t 427. 597. 0fc618dc-8… 0000 msajc… 150 152 Phon…
## 3 s->t->H 2037. 2148. 0fc618dc-8… 0000 msajc… 190 192 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
EQL
To learn more about the functionality of the EQL
, see also the manual chapter here.
As discussed in the preceding section, any query must be placed within " "
, and any query can be placed within [ ]
. The query must include minimally the name of an annotation tier combined with a representation for an annotation (which can also be a regular expression. Further details as follows.
In the examples above, the equality of the “V” annotations at the Phonetic
tier (in the database ae
) were tested:
## # A tibble: 3 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 V 187. 257. 0fc618dc-89… 0000 msajc… 147 147 Phon…
## 2 V 340. 427. 0fc618dc-89… 0000 msajc… 149 149 Phon…
## 3 V 1943. 2037. 0fc618dc-89… 0000 msajc… 189 189 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
The equality operator is ==
. For backward compatibility with earlier versions of emuR
a single =
is also allowed. Thus the preceding command is equivalent to:
## # A tibble: 3 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 V 187. 257. 0fc618dc-89… 0000 msajc… 147 147 Phon…
## 2 V 340. 427. 0fc618dc-89… 0000 msajc… 149 149 Phon…
## 3 V 1943. 2037. 0fc618dc-89… 0000 msajc… 189 189 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
Searches can be made for everything except V
by the use of !=
.
## # A tibble: 250 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 m 257. 340. 0fc618dc-8… 0000 msajc… 148 148 Phon…
## 2 N 427. 483. 0fc618dc-8… 0000 msajc… 150 150 Phon…
## 3 s 483. 567. 0fc618dc-8… 0000 msajc… 151 151 Phon…
## 4 t 567. 597. 0fc618dc-8… 0000 msajc… 152 152 Phon…
## 5 H 597. 674. 0fc618dc-8… 0000 msajc… 153 153 Phon…
## 6 @: 674. 740. 0fc618dc-8… 0000 msajc… 154 154 Phon…
## 7 f 740. 893. 0fc618dc-8… 0000 msajc… 155 155 Phon…
## 8 r 893. 950. 0fc618dc-8… 0000 msajc… 156 156 Phon…
## 9 E 950. 1032. 0fc618dc-8… 0000 msajc… 157 157 Phon…
## 10 n 1032. 1196. 0fc618dc-8… 0000 msajc… 158 158 Phon…
## # ℹ 240 more rows
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
One way to get ‘everything’ is to query something that is probably not in the database like xyz
. Alternatively, a regular expression can be used and is preceded by =~
. The regular expression for finding all annotations is .*
(meaning: any character (.
) zero or more times (*
)).
everything1 = query(ae,
"Phonetic != xyz")
everything2 = query(ae,
"Phonetic =~ .*")
# should be T if both are equal everywhere
all(everything1 == everything2)
## [1] TRUE
The operator !~
is for negation. An example would be:
## # A tibble: 37 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 her 674. 740. 0fc618… 0000 msajc… 24 24 Word
## 2 friends 740. 1289. 0fc618… 0000 msajc… 30 30 Word
## 3 she 1289. 1463. 0fc618… 0000 msajc… 43 43 Word
## 4 considered 1634. 2150. 0fc618… 0000 msajc… 61 61 Word
## 5 it 300. 412. 0fc618… 0000 msajc… 2 2 Word
## 6 is 412. 572. 0fc618… 0000 msajc… 14 14 Word
## 7 futile 572. 1091. 0fc618… 0000 msajc… 21 21 Word
## 8 to 1091. 1222. 0fc618… 0000 msajc… 38 38 Word
## 9 offer 1222. 1391. 0fc618… 0000 msajc… 48 48 Word
## 10 further 1628. 1958. 0fc618… 0000 msajc… 68 68 Word
## # ℹ 27 more rows
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
There are therefore four operators, two for equality matching, and two for inequality:
Symbol | Meaning |
---|---|
== |
equality |
=~ |
regular expression matching |
!= |
inequality |
!~ |
regular expression non-matching |
OR
operatorThe operator |
can be used to search for several annotations:
## # A tibble: 19 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 m 257. 340. 0fc618dc-8… 0000 msajc… 148 148 Phon…
## 2 n 1032. 1196. 0fc618dc-8… 0000 msajc… 158 158 Phon…
## 3 n 1741. 1791. 0fc618dc-8… 0000 msajc… 168 168 Phon…
## 4 n 1515. 1554. 0fc618dc-8… 0000 msajc… 170 170 Phon…
## 5 n 2431. 2528. 0fc618dc-8… 0000 msajc… 184 184 Phon…
## 6 n 895. 1023. 0fc618dc-8… 0000 msajc… 158 158 Phon…
## 7 m 1490. 1565. 0fc618dc-8… 0000 msajc… 169 169 Phon…
## 8 n 2402. 2475. 0fc618dc-8… 0000 msajc… 182 182 Phon…
## 9 m 497. 559. 0fc618dc-8… 0000 msajc… 188 188 Phon…
## 10 n 2227. 2271. 0fc618dc-8… 0000 msajc… 216 216 Phon…
## 11 n 3046. 3068. 0fc618dc-8… 0000 msajc… 229 229 Phon…
## 12 m 1587. 1656. 0fc618dc-8… 0000 msajc… 149 149 Phon…
## 13 m 819. 903. 0fc618dc-8… 0000 msajc… 120 120 Phon…
## 14 n 1435. 1495. 0fc618dc-8… 0000 msajc… 127 127 Phon…
## 15 n 1775. 1834. 0fc618dc-8… 0000 msajc… 132 132 Phon…
## 16 n 509. 544. 0fc618dc-8… 0000 msajc… 166 166 Phon…
## 17 m 1630. 1709. 0fc618dc-8… 0000 msajc… 185 185 Phon…
## 18 m 2173. 2233. 0fc618dc-8… 0000 msajc… 194 194 Phon…
## 19 n 2448. 2480. 0fc618dc-8… 0000 msajc… 199 199 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
## # A tibble: 23 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 m 257. 340. 0fc618dc-8… 0000 msajc… 148 148 Phon…
## 2 N 427. 483. 0fc618dc-8… 0000 msajc… 150 150 Phon…
## 3 n 1032. 1196. 0fc618dc-8… 0000 msajc… 158 158 Phon…
## 4 n 1741. 1791. 0fc618dc-8… 0000 msajc… 168 168 Phon…
## 5 n 1515. 1554. 0fc618dc-8… 0000 msajc… 170 170 Phon…
## 6 n 2431. 2528. 0fc618dc-8… 0000 msajc… 184 184 Phon…
## 7 n 895. 1023. 0fc618dc-8… 0000 msajc… 158 158 Phon…
## 8 m 1490. 1565. 0fc618dc-8… 0000 msajc… 169 169 Phon…
## 9 n 2402. 2475. 0fc618dc-8… 0000 msajc… 182 182 Phon…
## 10 m 497. 559. 0fc618dc-8… 0000 msajc… 188 188 Phon…
## # ℹ 13 more rows
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
It is possible in the the Emu system to define features and then to query them. The definition of features is accomplished with the function add/list/remove_attrDefLabelGroups()
. For the existing ae
database some features have already been defined. To see these for the Phonetic
tier:
## name
## 1 vowel
## 2 stop
## 3 nasal
## 4 fricative
## 5 approximant
## 6 other
## values
## 1 A; E; EC; I; O; V; U; ai; ei; oi; i@; u@; au; @u; @:; @; =; a:; e:; i:; o:; u:
## 2 p; tS; dZ; t; k; b; d; g
## 3 m; n
## 4 f; v; s; z; S; Z; h; D; D-; T
## 5 w; j; l; r; rr; Or
## 6 H
shows for example that there is a feature nasal
that includes the annotations m
and n
. Consequently, the following give the same output:
# Segment list of nasals
nas.s1 = query(ae, "Phonetic = nasal")
nas.s2 = query(ae, "Phonetic = m|n")
all(nas.s1 == nas.s2)
## [1] TRUE
Use add_attrDefLabelGroup()
to add new features. In this example, a feature grave
is added to the Phoneme
tier by which grave
includes the labial and velar consonants:
add_attrDefLabelGroup(
ae,
levelName = "Phoneme",
attributeDefinitionName = "Phoneme",
labelGroupName = "grave",
labelGroupValues = c("p", "b", "k", "g")
)
A segment list of the grave
annotations is then:
## # A tibble: 3 × 2
## labels n
## <chr> <int>
## 1 b 2
## 2 k 9
## 3 p 3
Anything except simple queries requires the use of [ ]
brackets. Thus whereas in simple queries brackets are optional.
sequence (and hierarchical) queries require [ ]
brackets.
The ->
operator is for finding sequences of annotations:
## # A tibble: 1 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 V->m 187. 340. 0fc618dc-89… 0000 msajc… 147 148 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
Note: all row entries in the resulting segment list have the start time of V
, the end time of m
and their annotations are V->m
. This can be changed with the result modifier
hash tag #
as follows:
## # A tibble: 1 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 V 187. 257. 0fc618dc-89… 0000 msajc… 147 147 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
## # A tibble: 1 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 m 257. 340. 0fc618dc-89… 0000 msajc… 148 148 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
Only one hash tag per query is allowed.
Embedded bracketing is needed to search for multiple sequences. This finds all sequences of @ n s
at the Phonetic
tier:
## # A tibble: 3 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 @->n->s 1715. 1893. 0fc618dc-8… 0000 msajc… 167 169 Phon…
## 2 @->n->s 2382. 2754. 0fc618dc-8… 0000 msajc… 183 185 Phon…
## 3 @->n->s 2201. 2409. 0fc618dc-8… 0000 msajc… 215 217 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
The following finds offer
followed by any two annotations followed by resistance
at the Text
tier.
## # A tibble: 1 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 offer->any… 1222. 2754. 0fc618… 0000 msajc… 48 80 Word
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
The operator ^
is for queries spanning two linked tiers. The following find all p
annotations at the Phoneme
tier in strong syllables (i.e. p
annotations at the Phoneme
tier dominated by / linked to S
annotations at the Syllable
tier):
## # A tibble: 3 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 p 559. 640. 0fc618dc-89… 0000 msajc… 147 147 Phon…
## 2 p 1656. 1699. 0fc618dc-89… 0000 msajc… 122 122 Phon…
## 3 p 864. 970. 0fc618dc-89… 0000 msajc… 136 136 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
The ^
operator is not directional. Thus, although Syllable
dominates Phoneme
, the same output is also given by:
## # A tibble: 3 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 p 559. 640. 0fc618dc-89… 0000 msajc… 147 147 Phon…
## 2 p 1656. 1699. 0fc618dc-89… 0000 msajc… 122 122 Phon…
## 3 p 864. 970. 0fc618dc-89… 0000 msajc… 136 136 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
Thus the semantics of ^
is not really “is dominated by”, but rather “is linked to”. However, the preceding requires #
which can be omitted by placing the annotations that are required in first position i.e. as query(ae, "[Phoneme == p ^ Syllable == S]")
.
Brackets are needed for queries spanning several linked tiers:
# Find all Phonetic annotations in strong syllables
# in either `amongst` or `beautiful`
query(ae,
"[[Phonetic =~ .* ^ Syllable == S]
^ Text == amongst | beautiful]")
## # A tibble: 9 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 m 257. 340. 0fc618dc-89… 0000 msajc… 148 148 Phon…
## 2 V 340. 427. 0fc618dc-89… 0000 msajc… 149 149 Phon…
## 3 N 427. 483. 0fc618dc-89… 0000 msajc… 150 150 Phon…
## 4 s 483. 567. 0fc618dc-89… 0000 msajc… 151 151 Phon…
## 5 t 567. 597. 0fc618dc-89… 0000 msajc… 152 152 Phon…
## 6 H 597. 674. 0fc618dc-89… 0000 msajc… 153 153 Phon…
## 7 db 2034. 2150. 0fc618dc-89… 0000 msajc… 173 173 Phon…
## 8 j 2150. 2211. 0fc618dc-89… 0000 msajc… 174 174 Phon…
## 9 u: 2211. 2284. 0fc618dc-89… 0000 msajc… 175 175 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
The &
operator is used to for annotations of a tier than is an attribute
of another tier.For example, Text
and Accent
are evidently attributes of the Word
tier, as shown by:
## name level type hasLabelGroups hasLegalLabels
## 1 Word Word STRING FALSE FALSE
## 2 Accent Word STRING FALSE FALSE
## 3 Text Word STRING FALSE FALSE
Thus to find all accented (S
) words:
## # A tibble: 25 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 amongst 187. 674. 0fc618… 0000 msajc… 2 2 Word
## 2 friends 740. 1289. 0fc618… 0000 msajc… 30 30 Word
## 3 beautiful 2034. 2604. 0fc618… 0000 msajc… 83 83 Word
## 4 futile 572. 1091. 0fc618… 0000 msajc… 21 21 Word
## 5 further 1628. 1958. 0fc618… 0000 msajc… 68 68 Word
## 6 resistance 1958. 2754. 0fc618… 0000 msajc… 80 80 Word
## 7 chill 380. 745. 0fc618… 0000 msajc… 13 13 Word
## 8 wind 745. 1083. 0fc618… 0000 msajc… 23 23 Word
## 9 caused 1083. 1456. 0fc618… 0000 msajc… 36 36 Word
## 10 shiver 1651. 1995. 0fc618… 0000 msajc… 70 70 Word
## # ℹ 15 more rows
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
To find all unaccented function words:
## # A tibble: 20 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 her 674. 740. 0fc618dc-8… 0000 msajc… 24 24 Word
## 2 she 1289. 1463. 0fc618dc-8… 0000 msajc… 43 43 Word
## 3 was 1463. 1634. 0fc618dc-8… 0000 msajc… 52 52 Word
## 4 it 300. 412. 0fc618dc-8… 0000 msajc… 2 2 Word
## 5 is 412. 572. 0fc618dc-8… 0000 msajc… 14 14 Word
## 6 to 1091. 1222. 0fc618dc-8… 0000 msajc… 38 38 Word
## 7 any 1437. 1628. 0fc618dc-8… 0000 msajc… 58 58 Word
## 8 the 300. 380. 0fc618dc-8… 0000 msajc… 2 2 Word
## 9 them 1456. 1565. 0fc618dc-8… 0000 msajc… 51 51 Word
## 10 to 1565. 1651. 0fc618dc-8… 0000 msajc… 60 60 Word
## 11 he 300. 425. 0fc618dc-8… 0000 msajc… 2 2 Word
## 12 his 1129. 1368. 0fc618dc-8… 0000 msajc… 37 37 Word
## 13 his 2694. 2781. 0fc618dc-8… 0000 msajc… 101 101 Word
## 14 are 662. 775. 0fc618dc-8… 0000 msajc… 19 19 Word
## 15 to 1806. 1890. 0fc618dc-8… 0000 msajc… 71 71 Word
## 16 I'll 300. 514. 0fc618dc-8… 0000 msajc… 2 2 Word
## 17 my 819. 1039. 0fc618dc-8… 0000 msajc… 23 23 Word
## 18 and 1422. 1495. 0fc618dc-8… 0000 msajc… 43 43 Word
## 19 this 300. 476. 0fc618dc-8… 0000 msajc… 2 2 Word
## 20 than 2368. 2480. 0fc618dc-8… 0000 msajc… 97 97 Word
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
To find unaccented function words but only if they follow a content word:
## # A tibble: 12 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 her 674. 740. 0fc618dc-8… 0000 msajc… 24 24 Word
## 2 she 1289. 1463. 0fc618dc-8… 0000 msajc… 43 43 Word
## 3 to 1091. 1222. 0fc618dc-8… 0000 msajc… 38 38 Word
## 4 any 1437. 1628. 0fc618dc-8… 0000 msajc… 58 58 Word
## 5 them 1456. 1565. 0fc618dc-8… 0000 msajc… 51 51 Word
## 6 his 1129. 1368. 0fc618dc-8… 0000 msajc… 37 37 Word
## 7 his 2694. 2781. 0fc618dc-8… 0000 msajc… 101 101 Word
## 8 are 662. 775. 0fc618dc-8… 0000 msajc… 19 19 Word
## 9 to 1806. 1890. 0fc618dc-8… 0000 msajc… 71 71 Word
## 10 my 819. 1039. 0fc618dc-8… 0000 msajc… 23 23 Word
## 11 and 1422. 1495. 0fc618dc-8… 0000 msajc… 43 43 Word
## 12 than 2368. 2480. 0fc618dc-8… 0000 msajc… 97 97 Word
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
There are three position functions, Start(X,Y)
, Medial(X,Y)
, and End(X,Y)
. In all position queries, the annotations are returned from Y
in F(X, Y)
where X
and Y
are the two tiers that form part of a position query. Thus the following finds all annotations at the Phoneme
tier that are initial relative to annotations at the Word
tier (i.e., it finds all word-initial phonemes).
## # A tibble: 54 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 V 187. 257. 0fc618dc-8… 0000 msajc… 114 114 Phon…
## 2 @: 674. 740. 0fc618dc-8… 0000 msajc… 120 120 Phon…
## 3 f 740. 893. 0fc618dc-8… 0000 msajc… 121 121 Phon…
## 4 S 1289. 1420. 0fc618dc-8… 0000 msajc… 126 126 Phon…
## 5 w 1463. 1506. 0fc618dc-8… 0000 msajc… 128 128 Phon…
## 6 k 1634. 1715. 0fc618dc-8… 0000 msajc… 131 131 Phon…
## 7 b 2034. 2150. 0fc618dc-8… 0000 msajc… 139 139 Phon…
## 8 I 300. 373. 0fc618dc-8… 0000 msajc… 119 119 Phon…
## 9 I 412. 476. 0fc618dc-8… 0000 msajc… 121 121 Phon…
## 10 f 572. 674. 0fc618dc-8… 0000 msajc… 123 123 Phon…
## # ℹ 44 more rows
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
Word-initial and word-medial phonemes could then be obtained by finding all phonemes that are not word-final:
## # A tibble: 167 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 V 187. 257. 0fc618dc-8… 0000 msajc… 114 114 Phon…
## 2 m 257. 340. 0fc618dc-8… 0000 msajc… 115 115 Phon…
## 3 V 340. 427. 0fc618dc-8… 0000 msajc… 116 116 Phon…
## 4 N 427. 483. 0fc618dc-8… 0000 msajc… 117 117 Phon…
## 5 s 483. 567. 0fc618dc-8… 0000 msajc… 118 118 Phon…
## 6 f 740. 893. 0fc618dc-8… 0000 msajc… 121 121 Phon…
## 7 r 893. 950. 0fc618dc-8… 0000 msajc… 122 122 Phon…
## 8 E 950. 1032. 0fc618dc-8… 0000 msajc… 123 123 Phon…
## 9 n 1032. 1196. 0fc618dc-8… 0000 msajc… 124 124 Phon…
## 10 S 1289. 1420. 0fc618dc-8… 0000 msajc… 126 126 Phon…
## # ℹ 157 more rows
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
To find all f
and S
phonemes that are word-initial:
## # A tibble: 5 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 f 740. 893. 0fc618dc-89… 0000 msajc… 121 121 Phon…
## 2 S 1289. 1420. 0fc618dc-89… 0000 msajc… 126 126 Phon…
## 3 f 572. 674. 0fc618dc-89… 0000 msajc… 123 123 Phon…
## 4 f 1628. 1741. 0fc618dc-89… 0000 msajc… 138 138 Phon…
## 5 S 1651. 1801. 0fc618dc-89… 0000 msajc… 137 137 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
To find all phonemes in intonationally phrase-final syllables is a domination/linked query. This is because (i) Y
is Syllable
in the second part of the query and (ii) Syllable
dominates Phoneme
. Because Syllable
dominates Phoneme
the two pieces of the query must be linked by ^
:
## # A tibble: 25 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 f 2362. 2447. 0fc618dc-8… 0000 msajc… 144 144 Phon…
## 2 @ 2447. 2506. 0fc618dc-8… 0000 msajc… 145 145 Phon…
## 3 l 2506. 2604. 0fc618dc-8… 0000 msajc… 146 146 Phon…
## 4 s 2228. 2319. 0fc618dc-8… 0000 msajc… 146 146 Phon…
## 5 t 2319. 2382. 0fc618dc-8… 0000 msajc… 147 147 Phon…
## 6 @ 2382. 2431. 0fc618dc-8… 0000 msajc… 148 148 Phon…
## 7 n 2431. 2528. 0fc618dc-8… 0000 msajc… 149 149 Phon…
## 8 s 2528. 2754. 0fc618dc-8… 0000 msajc… 150 150 Phon…
## 9 l 2534. 2569. 0fc618dc-8… 0000 msajc… 148 148 Phon…
## 10 i: 2569. 2692. 0fc618dc-8… 0000 msajc… 149 149 Phon…
## # ℹ 15 more rows
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
To find all phonemes in intonationally phrase-final weak syllables is:
## # A tibble: 15 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 f 2362. 2447. 0fc618dc-8… 0000 msajc… 144 144 Phon…
## 2 @ 2447. 2506. 0fc618dc-8… 0000 msajc… 145 145 Phon…
## 3 l 2506. 2604. 0fc618dc-8… 0000 msajc… 146 146 Phon…
## 4 s 2228. 2319. 0fc618dc-8… 0000 msajc… 146 146 Phon…
## 5 t 2319. 2382. 0fc618dc-8… 0000 msajc… 147 147 Phon…
## 6 @ 2382. 2431. 0fc618dc-8… 0000 msajc… 148 148 Phon…
## 7 n 2431. 2528. 0fc618dc-8… 0000 msajc… 149 149 Phon…
## 8 s 2528. 2754. 0fc618dc-8… 0000 msajc… 150 150 Phon…
## 9 l 2534. 2569. 0fc618dc-8… 0000 msajc… 148 148 Phon…
## 10 i: 2569. 2692. 0fc618dc-8… 0000 msajc… 149 149 Phon…
## 11 s 3123. 3239. 0fc618dc-8… 0000 msajc… 182 182 Phon…
## 12 @ 3239. 3298. 0fc618dc-8… 0000 msajc… 183 183 Phon…
## 13 z 3298. 3457. 0fc618dc-8… 0000 msajc… 184 184 Phon…
## 14 v 2588. 2646. 0fc618dc-8… 0000 msajc… 160 160 Phon…
## 15 @ 2646. 2795. 0fc618dc-8… 0000 msajc… 161 161 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
These involve a comparison with a number (by using one of ==
, !=
, >
, >=
, <
, <=
: see below). The count function Num(X,Y)
which counts the number of annotations at tier Y
relative to tier X
. Notice that whereas position queries return annotations at tier Y
, count queries return annotations at tier X
. Thus the following finds all bisyllabic words and then all words made of four or more syllables:
## # A tibble: 11 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 amongst 187. 674. 0fc618dc… 0000 msajc… 2 2 Word
## 2 futile 572. 1091. 0fc618dc… 0000 msajc… 21 21 Word
## 3 any 1437. 1628. 0fc618dc… 0000 msajc… 58 58 Word
## 4 further 1628. 1958. 0fc618dc… 0000 msajc… 68 68 Word
## 5 shiver 1651. 1995. 0fc618dc… 0000 msajc… 70 70 Word
## 6 itches 300. 662. 0fc618dc… 0000 msajc… 2 2 Word
## 7 always 775. 1280. 0fc618dc… 0000 msajc… 28 28 Word
## 8 tempting 1401. 1806. 0fc618dc… 0000 msajc… 51 51 Word
## 9 display 667. 1211. 0fc618dc… 0000 msajc… 25 25 Word
## 10 attracts 1211. 1579. 0fc618dc… 0000 msajc… 44 44 Word
## 11 ever 2480. 2795. 0fc618dc… 0000 msajc… 106 106 Word
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
## # A tibble: 7 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 S 257. 674. 0fc618dc-89… 0000 msajc… 103 103 Syll…
## 2 S 740. 1289. 0fc618dc-89… 0000 msajc… 105 105 Syll…
## 3 W 2228. 2754. 0fc618dc-89… 0000 msajc… 118 118 Syll…
## 4 S 1213. 1797. 0fc618dc-89… 0000 msajc… 134 134 Syll…
## 5 S 1890. 2470. 0fc618dc-89… 0000 msajc… 105 105 Syll…
## 6 S 1964. 2554. 0fc618dc-89… 0000 msajc… 90 90 Syll…
## 7 S 1248. 1579. 0fc618dc-89… 0000 msajc… 119 119 Syll…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
To get all syllables in bisyllabic words requires:
## # A tibble: 22 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 W 187. 257. 0fc618dc-8… 0000 msajc… 102 102 Syll…
## 2 S 257. 674. 0fc618dc-8… 0000 msajc… 103 103 Syll…
## 3 S 572. 798. 0fc618dc-8… 0000 msajc… 107 107 Syll…
## 4 S 798. 1091. 0fc618dc-8… 0000 msajc… 108 108 Syll…
## 5 S 1437. 1515. 0fc618dc-8… 0000 msajc… 112 112 Syll…
## 6 W 1515. 1628. 0fc618dc-8… 0000 msajc… 113 113 Syll…
## 7 S 1628. 1864. 0fc618dc-8… 0000 msajc… 114 114 Syll…
## 8 W 1864. 1958. 0fc618dc-8… 0000 msajc… 115 115 Syll…
## 9 S 1651. 1863. 0fc618dc-8… 0000 msajc… 113 113 Syll…
## 10 W 1863. 1995. 0fc618dc-8… 0000 msajc… 114 114 Syll…
## # ℹ 12 more rows
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
Notice that the above query has to be linked by ^
. This is because as stated earlier Num(X, Y)
returns annotations from tier X
which is in this case Text
which dominates (is linked to) Syllable
. Similarly, to find all Intermediate
phrases that contain words of 4 or more syllables:
## # A tibble: 1 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 L- 1565. 2692. 0fc618dc-89… 0000 msajc… 63 63 Inte…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
Again the two bits of the query must be joined with ^
because Intermediate
dominates Text
. To get all words of 4 or more syllables in L- Intermediate
phrases requires putting the Num(X,Y)
function as the first part of the query argument:
## # A tibble: 1 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 violently 1995. 2692. 0fc618dc… 0000 msajc… 82 82 Word
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
As a general guide, it’s helpful to build up more complex queries that involve e.g. both number, position and multiple tiers in stages. For example:
The above would match circumstance
(if it’s in the database) because the word is trisyllabic and ends in /s/.
## # A tibble: 7 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 considered 1634. 2150. 0fc618d… 0000 msajc… 61 61 Word
## 2 beautiful 2034. 2604. 0fc618d… 0000 msajc… 83 83 Word
## 3 resistance 1958. 2754. 0fc618d… 0000 msajc… 80 80 Word
## 4 emphasized 425. 1129. 0fc618d… 0000 msajc… 13 13 Word
## 5 concealing 2104. 2694. 0fc618d… 0000 msajc… 78 78 Word
## 6 weaknesses 2781. 3457. 0fc618d… 0000 msajc… 109 109 Word
## 7 customers 1824. 2368. 0fc618d… 0000 msajc… 73 73 Word
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
# Find all three syllable words that end in /s/.
query(ae,
"[Num(Text, Syllable)=3 ^
Phoneme=s & End(Text, Phoneme)=T]")
## # A tibble: 1 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 resistance 1958. 2754. 0fc618d… 0000 msajc… 80 80 Word
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
Note again the need for ^
in the above. This is because the first part of the query returns Text
annotations and the second part returns Phoneme
annotations and because the Text
tier dominates (is linked to) the Phoneme
tier. To get at the Phoneme
annotations in the above, modify the above with #
# Find all word-final /s/ in trisyllabic words.
query(ae,
"[Num(Text, Syllable)=3 ^
#Phoneme=s & End(Text, Phoneme)=T]")
## # A tibble: 1 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 s 2528. 2754. 0fc618dc-89… 0000 msajc… 150 150 Phon…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
To get the syllable that dominates an /s/ that is word-final:
# Syllables dominating word-final /s/
query(ae,
"[Phoneme=s & End(Text, Phoneme)=T ^
#Syllable =~ .*]")
## # A tibble: 6 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 W 2228. 2754. 0fc618dc-89… 0000 msajc… 118 118 Syll…
## 2 S 1213. 1797. 0fc618dc-89… 0000 msajc… 134 134 Syll…
## 3 S 1039. 1422. 0fc618dc-89… 0000 msajc… 86 86 Syll…
## 4 S 1964. 2554. 0fc618dc-89… 0000 msajc… 90 90 Syll…
## 5 W 300. 476. 0fc618dc-89… 0000 msajc… 114 114 Syll…
## 6 S 1248. 1579. 0fc618dc-89… 0000 msajc… 119 119 Syll…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
or equivalently without #
:
# Syllables dominating word-final /s/
query(ae, "[Syllable =~ .* ^
Phoneme=s & End(Text, Phoneme)=T]")
## # A tibble: 6 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 W 2228. 2754. 0fc618dc-89… 0000 msajc… 118 118 Syll…
## 2 S 1213. 1797. 0fc618dc-89… 0000 msajc… 134 134 Syll…
## 3 S 1039. 1422. 0fc618dc-89… 0000 msajc… 86 86 Syll…
## 4 S 1964. 2554. 0fc618dc-89… 0000 msajc… 90 90 Syll…
## 5 W 300. 476. 0fc618dc-89… 0000 msajc… 114 114 Syll…
## 6 S 1248. 1579. 0fc618dc-89… 0000 msajc… 119 119 Syll…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
Now just modify the command for obtaining all word-final /s/ in trisyllabic words with the above command, thus:
# the final syllable of trisyllabic words
# that end in /s/.
query(ae,
"[Num(Text, Syllable)=3 ^
[Phoneme=s & End(Text, Phoneme)=T ^
#Syllable =~ .*]]")
## # A tibble: 1 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 W 2228. 2754. 0fc618dc-89… 0000 msajc… 118 118 Syll…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>
or equivalently:
# the final syllable of trisyllabic words
# that end in /s/.
query(ae, "[[Syllable =~ .* ^
Phoneme=s & End(Text, Phoneme)=T] ^
Num(Text, Syllable)=3]")
## # A tibble: 1 × 16
## labels start end db_uuid session bundle start_item_id end_item_id level
## <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr>
## 1 W 2228. 2754. 0fc618dc-89… 0000 msajc… 118 118 Syll…
## # ℹ 7 more variables: attribute <chr>, start_item_seq_idx <int>,
## # end_item_seq_idx <int>, type <chr>, sample_start <int>, sample_end <int>,
## # sample_rate <int>