First of all, let us once again create a temporary EMU-SDMS-database:
# load package
library(emuR)
# create demo data in directory
# provided by tempdir()
create_emuRdemoData(dir = tempdir())
# create path to demo database
path2ae = file.path(tempdir(), "emuR_demoData", "ae_emuDB")
# load database
ae = load_emuDB(path2ae, verbose = F)
summary(ae)
In the link definitions, we can see a very rich annotation structure. This will allow us to perform even more complicated queries. Let us have a look at the annotation structure:
serve(ae)
Figure 1: Hierarchy of the first utterance of the database ae
However, let us start with very basic queries. The command for conducting queries is simply query
, called with at least two arguments, emuDBhandle
and query
:
query(emuDBhandle = ae,query = "Phonetic==V")
## segment list from database: ae
## query was: Phonetic==V
## labels start end session bundle level type
## 1 V 187.425 256.925 0000 msajc003 Phonetic SEGMENT
## 2 V 340.175 426.675 0000 msajc003 Phonetic SEGMENT
## 3 V 1943.175 2037.425 0000 msajc057 Phonetic SEGMENT
We could have been less verbose instead to achieve the very same result, e.g. by typing:
query(ae,"Phonetic==V")
The expression "Phonetic==V"
is a legal expression in the EMU Query Language (EQL)
and could be translated into “which labels in the level Phonetic equal the label”V“”. However, before we start to learn the EQL
in detail, we will discuss other parameters of the query
-command, and learn the structure of a result of the query-command.
help(query)
query()
query(emuDBhandle, +
query, +
sessionPattern = ".*", +
bundlePattern = ".*", +
queryLang = "EQL2", +
timeRefSegmentLevel = NULL, +
resultType = NULL, +
calcTimes = TRUE, verbose = FALSE)
Argument | Meaning |
---|---|
emuDBhandle |
emuDB handle object |
query |
string (see vignette("EQL") ) |
sessionPattern |
A regular expression pattern matching session names to be searched from the database |
bundlePattern |
A regular expression pattern matching bundle names to be searched from the database |
queryLang |
query language used for evaluating the query string |
timeRefSegmentLevel |
set time segment level from which to derive time information. It is only necessary to set this parameter if the queried parent level is of type ITEM and more than one child level contains time information. |
resultType |
type (class name) of result |
calcTimes |
calculate times for resulting segments (results in NA values for start and end times in emuseg/emuRsegs). As it can be very computationally expensive to calculate the times for large nested hierarchies it can be turned off via this boolian parameter. |
verbose |
be verbose. Set this to TRUE if you wish to choose which path to traverse on intersecting hierarchies. If set to FALSE (the default) all paths will be traversed (= legacy EMU behaviour). |
We could e.g. restrict ourselves to only one sessionPattern
(but there is only one in ae
) or to only one bundlePattern
; the latter case may be a not-too-unusual use case:
query(emuDBhandle = ae,query = "Phonetic==V",bundlePattern = "msajc003")
## segment list from database: ae
## query was: Phonetic==V
## labels start end session bundle level type
## 1 V 187.425 256.925 0000 msajc003 Phonetic SEGMENT
## 2 V 340.175 426.675 0000 msajc003 Phonetic SEGMENT
emuRsegs
An emuR segment list is a list of segment descriptors. Each segment descriptor describes a sequence of annotation elements. The list is usually a result of an emuDB query using function query
.
An emuRsegs
object is an attributed data.frame, with one row per segment descriptor.
Data frame columns
labels: sequenced labels of segment concatenated by ‘->’
start: onset time in milliseconds
end: offset time in milliseconds
session: session name
bundle: bundle name
level: level name
type: type of “segment” row: ITEM
: symbolic item, EVENT
: event item, SEGMENT
: segment
Additional hidden columns
utts: utterance name (for compatibility to emusegs class)
db_uuid: UUID of emuDB
startItemID: item ID of first element of sequence
endItemID: item ID of last element of sequence
sampleStart: start sample position
sampleEnd: end sample position
sampleRate: sample rate
Attributes
database: name of emuDB
query: Query string
type: type (SEGMENT
or EVENT
, but not ITEM
) (for compatibility to emusegs
class)
Assign our query above to an object that we call V_first_utt
:
(V_first_utt = query(emuDBhandle = ae,query = "Phonetic==V",bundlePattern = "msajc003"))
## segment list from database: ae
## query was: Phonetic==V
## labels start end session bundle level type
## 1 V 187.425 256.925 0000 msajc003 Phonetic SEGMENT
## 2 V 340.175 426.675 0000 msajc003 Phonetic SEGMENT
class(V_first_utt)
## [1] "emuRsegs" "emusegs" "data.frame"
summary(V_first_utt)
## segment list from database: ae
## query was: Phonetic==V
## with 2 segments
##
## Segment distribution:
##
## V
## 2
Although every column can easily be called with the $-operator, there are some commands in emuR doing the very same job, e.g. the commands label()
, start()
, end()
:
#Get labels:
#either
V_first_utt$labels
## [1] "V" "V"
#or
label(V_first_utt)
## [1] "V" "V"
Other examples:
#Get start times:
start(V_first_utt)
## [1] 187.425 340.175
#Get end times:
end(V_first_utt)
## [1] 256.925 426.675
Other more complex commands are simply a shorter way of doing rather simple things:
#Get durations:
dur(V_first_utt)
## [1] 69.5 86.5
#which is, of course, the same as:
end(V_first_utt) - start(V_first_utt)
## [1] 69.5 86.5
#We can also use the $-operator to access the columns of V_first_utt, because it is a data.frame
#see chapter 07
V_first_utt$end - V_first_utt$start
## [1] 69.5 86.5
Calculated times You might recall from chapter 04, that start and end times are stored internally as sampleStart
and sampleDur
. In the very same annot.json-file, there is also information about the sampling frequency:
{
"name": "msajc003",
"annotates": "msajc003.wav",
"sampleRate": 20000,
...
{
"name": "Phonetic",
"type": "SEGMENT",
"items": [
{
"id": 147,
"sampleStart": 3749,
"sampleDur": 1389,
"labels": [
{
"name": "Phonetic",
"value": "V"
}
]
},
This means, that out first “V” start at the 3749th sample of 20000 samples per second. We can calculate:
3749/20000
## [1] 0.18745
… i.e., our first V starts a 187.45 milliseconds. Compare this to
start(V_first_utt)[1]
## [1] 187.425
In other words, $start
and $end
report times in milliseconds. As we have mentioned earlier, $sampleStart
and $sampleDur
are available in the emuRsegs
-object, but are hidden. Nevertheless, we could call them with
V_first_utt$sampleStart
## NULL
V_first_utt$sampleDur
## NULL
As can be seen in the annot.json-file, our “V” on level “Phonetic” is of type SEGMENT
, i.e. of a time-aligned type. This information can also be found in
V_first_utt
## segment list from database: ae
## query was: Phonetic==V
## labels start end session bundle level type
## 1 V 187.425 256.925 0000 msajc003 Phonetic SEGMENT
## 2 V 340.175 426.675 0000 msajc003 Phonetic SEGMENT
What happens, if we were looking for a timeless ITEM
?
(amongst=query(emuDBhandle = ae,query = "Text==amongst",bundlePattern = "msajc003"))
## segment list from database: ae
## query was: Text==amongst
## labels start end session bundle level type
## 1 amongst 187.425 674.175 0000 msajc003 Text ITEM
We can see in $type
, that the type of “amongst” is ITEM
. However, we can see non-empty $start
and $end
columns, and
attributes(amongst)$type
## [1] "segment"
says, that its type was “segment” (for compatibility with the former emusegs format). So, although “amongst” is actually a timeless ITEM
…
{
"name": "Word",
"type": "ITEM",
"items": [
{
"id": 2,
"labels": [
{
"name": "Word",
"value": "C"
},
{
"name": "Accent",
"value": "S"
},
{
"name": "Text",
"value": "amongst"
}
]
},
… times can be derived from this item’s first and last time-aligned segments (our first “V” above is the first segment, and amongst end with a “t”, that is aspirated (=“H” on the Phonetic level, see Figure 1))
start(query(emuDBhandle = ae,query = "Text==amongst",bundlePattern = "msajc003"))
## [1] 187.425
start(query(emuDBhandle = ae,query = "Phonetic==V",bundlePattern = "msajc003")[1,])
## [1] 187.425
end(query(emuDBhandle = ae,query = "Text==amongst",bundlePattern = "msajc003"))
## [1] 674.175
end(query(emuDBhandle = ae,query = "Phonetic==H",bundlePattern = "msajc003")[1,])
## [1] 674.175
This, of course, will only work if Text (one attribute of Word) and Phonetic levels are linked (and they are, see also Figure 1):
list_linkDefinitions(ae)
## type superlevelName sublevelName
## 1 ONE_TO_MANY Utterance Intonational
## 2 ONE_TO_MANY Intonational Intermediate
## 3 ONE_TO_MANY Intermediate Word
## 4 ONE_TO_MANY Word Syllable
## 5 ONE_TO_MANY Syllable Phoneme
## 6 MANY_TO_MANY Phoneme Phonetic
## 7 ONE_TO_MANY Syllable Tone
## 8 ONE_TO_MANY Intonational Foot
## 9 ONE_TO_MANY Foot Syllable
If the ITEM
we are interested in was linked to several time-aligned segments, we would have to use query
’s parameter timeRefSegmentLevel
to choose the segment level from which query
derives time information.
requery_hier
and requery_seq
We already know that the first “V” in our results belongs to (and is linked to) the word “amongst”. By which words are the other “V”s dominated, then? We could find our by a hierarchical re-query:
#find all "V"-labels in `ae`
(V=query(emuDBhandle = ae,query = "Phonetic==V"))
## segment list from database: ae
## query was: Phonetic==V
## labels start end session bundle level type
## 1 V 187.425 256.925 0000 msajc003 Phonetic SEGMENT
## 2 V 340.175 426.675 0000 msajc003 Phonetic SEGMENT
## 3 V 1943.175 2037.425 0000 msajc057 Phonetic SEGMENT
Now put this segment list into requery_hier() and look for the linked ITEM
in Word:Text
:
requery_hier(emuDBhandle = ae,seglist = V,level = "Text")
## segment list from database: ae
## query was: FROM REQUERY
## labels start end session bundle level type
## 1 amongst 187.425 674.175 0000 msajc003 Text ITEM
## 2 amongst 187.425 674.175 0000 msajc003 Text ITEM
## 3 customers 1824.425 2367.775 0000 msajc057 Text ITEM
Your result will be the ITEM
labels and calculated times (for the corresponding words).
You could also wish to know what “V”s contexts are, e.g. the subsequent segments. We use the sequential structure of the database, and the command requery_seq()
:
requery_seq(emuDBhandle = ae,seglist = V,offset = 1)
## segment list from database: ae
## query was: FROM REQUERY
## labels start end session bundle level type
## 1 m 256.925 340.175 0000 msajc003 Phonetic SEGMENT
## 2 N 426.675 483.425 0000 msajc003 Phonetic SEGMENT
## 3 s 2037.425 2085.175 0000 msajc057 Phonetic SEGMENT
We will discuss both commands more extensively later in the seminar, but wanted to show that it is possible to use the annotation structure and a given segment list to retrieve additional information afterwards. We could use both commands to express more complex queries: e.g. we could look for all “V” within the word “amongst” by querying “V”, then requery all linked words, and then deletin all “V” that are not linked to “amongst”. However, this would be rather cumbersome. A much easier way to conduct more complicated queries is the use of all possibilities of emuR’s query language EQL
within the command query
.
EQL
To learn about the functionality of the EQL
, you can always type
vignette("EQL")
As we have seen above, any query must be placed within " “. You minimally have to give a level, and some sort of representation for a label (this may be a regular expression), unless you do not use one of the position
and count
functions.
In the examples above, we had looked for the equality of the labels to “V” on the level “Phonetic” (in the database ae
):
query(emuDBhandle = ae, query = "Phonetic == V")
## segment list from database: ae
## query was: Phonetic == V
## labels start end session bundle level type
## 1 V 187.425 256.925 0000 msajc003 Phonetic SEGMENT
## 2 V 340.175 426.675 0000 msajc003 Phonetic SEGMENT
## 3 V 1943.175 2037.425 0000 msajc057 Phonetic SEGMENT
So “==” is the equality operator. For backward compatibility, a single “=” is also allowed (but we ask you to prefer “==” instead):
query(emuDBhandle = ae, query = "Phonetic = V")
## segment list from database: ae
## query was: Phonetic = V
## labels start end session bundle level type
## 1 V 187.425 256.925 0000 msajc003 Phonetic SEGMENT
## 2 V 340.175 426.675 0000 msajc003 Phonetic SEGMENT
## 3 V 1943.175 2037.425 0000 msajc057 Phonetic SEGMENT
We can also search everything except “V” by the use of !=
query(emuDBhandle = ae, query = "Phonetic != V")
(We do not show the resulting segment list, because it is very long.) So one way to get ‘everything’ would be to query something that is not in your database, like “xyz”. However, there is a much better way: Using regular expressions. To use these, you have to type “=~”:
Everything1 = query(emuDBhandle = ae, query = "Phonetic != xyz")
Everything2 = query(emuDBhandle = ae, query = "Phonetic =~ .*")
any(Everything1 != Everything2) # should result in FALSE if both are equal everywhere
## [1] FALSE
You can also negate the latter operator by “!~”. A not very useful example would be:
(Nothing = query(emuDBhandle = ae, query = "Phonetic !~ .*"))
## segment list from database: ae
## query was: Phonetic !~ .*
## [1] labels start end session bundle level type
## <0 rows> (or 0-length row.names)
A more interesting example would be:
# What is the query to retrieve all ITEMs in the “Text” level that don’t begin with ‘a’?
query(emuDBhandle = ae, query = "Text !~ a.*")
So, there are four similar operators, two for equality matching, and two for inequalitiy:
Symbol | Meaning |
---|---|
== |
equality |
=~ |
regular expression matching |
!= |
inequality |
!~ |
regular expression non-matching |
OR
operatorUse “|” to look for one label and another one(s), e.g.
query(emuDBhandle = ae, query = "Phonetic == m|n")
## segment list from database: ae
## query was: Phonetic == m|n
## labels start end session bundle level type
## 1 m 256.925 340.175 0000 msajc003 Phonetic SEGMENT
## 2 n 1031.925 1195.925 0000 msajc003 Phonetic SEGMENT
## 3 n 1741.425 1791.425 0000 msajc003 Phonetic SEGMENT
## 4 n 1515.475 1554.475 0000 msajc010 Phonetic SEGMENT
## 5 n 2430.975 2528.475 0000 msajc010 Phonetic SEGMENT
## 6 n 894.975 1022.975 0000 msajc012 Phonetic SEGMENT
## 7 m 1490.425 1564.975 0000 msajc012 Phonetic SEGMENT
## 8 n 2402.275 2474.875 0000 msajc012 Phonetic SEGMENT
## 9 m 496.575 558.575 0000 msajc015 Phonetic SEGMENT
## 10 n 2226.575 2271.075 0000 msajc015 Phonetic SEGMENT
## 11 n 3046.125 3067.675 0000 msajc015 Phonetic SEGMENT
## 12 m 1587.175 1655.675 0000 msajc022 Phonetic SEGMENT
## 13 m 819.025 902.925 0000 msajc023 Phonetic SEGMENT
## 14 n 1434.775 1495.275 0000 msajc023 Phonetic SEGMENT
## 15 n 1774.925 1833.925 0000 msajc023 Phonetic SEGMENT
## 16 n 508.675 543.975 0000 msajc057 Phonetic SEGMENT
## 17 m 1629.675 1709.175 0000 msajc057 Phonetic SEGMENT
## 18 m 2173.425 2233.425 0000 msajc057 Phonetic SEGMENT
## 19 n 2447.675 2480.425 0000 msajc057 Phonetic SEGMENT
You can expand this as well:
mnN = query(emuDBhandle = ae, query = "Phonetic == m | n | N")
summary(mnN)
## segment list from database: ae
## query was: Phonetic == m | n | N
## with 23 segments
##
## Segment distribution:
##
## m n N
## 7 12 4
In all hierarchical queries, bracketing with “[" and “]” is required to structure your query. In simple queries, however, brackets are optional.
mnN = query(emuDBhandle = ae, query = "[Phonetic == m|n|N]")
summary(mnN)
## segment list from database: ae
## query was: [Phonetic == m|n|N]
## with 23 segments
##
## Segment distribution:
##
## m n N
## 7 12 4
However, this sequential query would fail, because of missing brackets:
query(ae, "Phonetic == V -> Phonetic == m")
Use the “->” operator to find sequences of segments:
query(ae, "[Phonetic == V -> Phonetic == m]")
## segment list from database: ae
## query was: [Phonetic == V -> Phonetic == m]
## labels start end session bundle level type
## 1 V->m 187.425 340.175 0000 msajc003 Phonetic SEGMENT
Note: all row entries in the resulting segment list have the start time of “V”, the end time of “m” and their labels will be “V->m”. Change this with the so-called result modifier
hash tag “#”:
query(ae, "[#Phonetic == V -> Phonetic == m]")
## segment list from database: ae
## query was: [#Phonetic == V -> Phonetic == m]
## labels start end session bundle level type
## 1 V 187.425 256.925 0000 msajc003 Phonetic SEGMENT
query(ae, "[Phonetic == V -> #Phonetic == m]")
## segment list from database: ae
## query was: [Phonetic == V -> #Phonetic == m]
## labels start end session bundle level type
## 1 m 256.925 340.175 0000 msajc003 Phonetic SEGMENT
Keep in mind that only one hash tag per query is allowed.
You can search sequences of sequences, however, you have to use bracketing; otherwise, you get an error
query(ae, "[Phonetic == @ -> Phonetic == n -> Phonetic == s]")
The correct code would be as follows:
query(ae, "[[Phonetic == @ -> Phonetic == n ] -> Phonetic == s]")
## segment list from database: ae
## query was: [[Phonetic == @ -> Phonetic == n ] -> Phonetic == s]
## labels start end session bundle level type
## 1 @->n->s 1715.425 1893.175 0000 msajc003 Phonetic SEGMENT
## 2 @->n->s 2382.475 2753.975 0000 msajc010 Phonetic SEGMENT
## 3 @->n->s 2200.875 2408.575 0000 msajc015 Phonetic SEGMENT
## What is the query to retrieve all sequences of ITEMs containing labels “offer” followed by two arbitrary labels followed by “resistance”?
query(ae, "[[[Text == offer -> Text =~ .*] -> Text =~ .* ] -> Text == resistance]")
## segment list from database: ae
## query was: [[[Text == offer -> Text =~ .*] -> Text =~ .* ] -> Text == resistance]
## labels start end session bundle level
## 1 offer->any->further->resistance 1957.775 2753.975 0000 msajc010 Text
## type
## 1 ITEM
Use the operator “^” for all queries, in which two linked levels are involved.
## What is the query to retrieve all ITEMs containing the label “p” in the “Phoneme” level that occur in strong syllables (i.e. dominated by / linked to ITEMs of the level “Syllable” that contain the label “S”)?
query(ae, "[Phoneme == p ^ Syllable == S]")
## segment list from database: ae
## query was: [Phoneme == p ^ Syllable == S]
## labels start end session bundle level type
## 1 p 558.575 639.575 0000 msajc015 Phoneme ITEM
## 2 p 1655.675 1698.675 0000 msajc022 Phoneme ITEM
## 3 p 863.675 970.425 0000 msajc057 Phoneme ITEM
However, the operator is not directional; although “Syllable” dominates “Phoneme”, you could have asked
query(ae, "[Syllable == S ^ #Phoneme == p]")
## segment list from database: ae
## query was: [Syllable == S ^ #Phoneme == p]
## labels start end session bundle level type
## 1 p 558.575 639.575 0000 msajc015 Phoneme ITEM
## 2 p 1655.675 1698.675 0000 msajc022 Phoneme ITEM
## 3 p 863.675 970.425 0000 msajc057 Phoneme ITEM
So, “^” should not be translated with “is dominated by”. However, you have to use the hash tag in order to get labels and times of the Phoneme level here. You can leave out the hash tag if the level you are interested in is the first one in your question.
You can query multiple dominations, however, like in the sequencing case, you have to use brackets:
## What is the query to retrieve all ITEMs on the “Phonetic” level that are part of a strong syllable (labeled “S”) and belong to the words “amongst” or “beautiful”?
query(ae, "[[Phonetic =~ .* ^ Syllable == S] ^ Text == amongst | beautiful]")
## segment list from database: ae
## query was: [[Phonetic =~ .* ^ Syllable == S] ^ Text == amongst | beautiful]
## labels start end session bundle level type
## 1 m 256.925 340.175 0000 msajc003 Phonetic SEGMENT
## 2 V 340.175 426.675 0000 msajc003 Phonetic SEGMENT
## 3 N 426.675 483.425 0000 msajc003 Phonetic SEGMENT
## 4 s 483.425 566.925 0000 msajc003 Phonetic SEGMENT
## 5 t 566.925 596.675 0000 msajc003 Phonetic SEGMENT
## 6 H 596.675 674.175 0000 msajc003 Phonetic SEGMENT
## 7 db 2033.675 2150.175 0000 msajc003 Phonetic SEGMENT
## 8 j 2150.175 2211.175 0000 msajc003 Phonetic SEGMENT
## 9 u: 2211.175 2283.675 0000 msajc003 Phonetic SEGMENT
# same as
query(ae, "[[#Phonetic =~ .* ^ Syllable == S] ^ Text == amongst | beautiful]")
## to get the "Text"-items instead, use
query(ae, "[[Phonetic =~ .* ^ Syllable == S] ^ #Text == amongst | beautiful]")
## segment list from database: ae
## query was: [[Phonetic =~ .* ^ Syllable == S] ^ #Text == amongst | beautiful]
## labels start end session bundle level type
## 1 amongst 187.425 674.175 0000 msajc003 Text ITEM
## 2 beautiful 2033.675 2604.425 0000 msajc003 Text ITEM
The are three position and one count functions. As the latter function results in a number, queries involve a comparison with a number (by using one of “==”, “!=”, “>”, “>=”, “<”, “<=”, see below); The result of the position functions is logical; we therefore ask, whether a certain condition is TRUE
or FALSE
.
There are three position functions, Start()
, Medial()
, and End()
. Example queries are:
## What is the query to retrieve all word-initial syllables?
query(ae, "[Start(Text, Syllable) == TRUE]", bundlePattern = "msajc003")
## segment list from database: ae
## query was: [Start(Text, Syllable) == TRUE]
## labels start end session bundle level type
## 1 W 187.425 256.925 0000 msajc003 Syllable ITEM
## 2 S 674.175 739.925 0000 msajc003 Syllable ITEM
## 3 S 739.925 1289.425 0000 msajc003 Syllable ITEM
## 4 W 1289.425 1463.175 0000 msajc003 Syllable ITEM
## 5 W 1463.175 1634.425 0000 msajc003 Syllable ITEM
## 6 W 1634.425 1791.425 0000 msajc003 Syllable ITEM
## 7 S 2033.675 2283.675 0000 msajc003 Syllable ITEM
## be careful with the direction of the arguments; the following gives a result, which makes no sense (the word label in Text at every Start of a syllable):
query(ae, "[Start(Syllable, Text) == TRUE]", bundlePattern = "msajc003")
## segment list from database: ae
## query was: [Start(Syllable, Text) == TRUE]
## labels start end session bundle level type
## 1 amongst 187.425 674.175 0000 msajc003 Text ITEM
## 2 amongst 187.425 674.175 0000 msajc003 Text ITEM
## 3 her 674.175 739.925 0000 msajc003 Text ITEM
## 4 friends 739.925 1289.425 0000 msajc003 Text ITEM
## 5 she 1289.425 1463.175 0000 msajc003 Text ITEM
## 6 was 1463.175 1634.425 0000 msajc003 Text ITEM
## 7 considered 1634.425 2150.175 0000 msajc003 Text ITEM
## 8 considered 1634.425 2150.175 0000 msajc003 Text ITEM
## 9 considered 1634.425 2150.175 0000 msajc003 Text ITEM
## 10 beautiful 2033.675 2604.425 0000 msajc003 Text ITEM
## 11 beautiful 2033.675 2604.425 0000 msajc003 Text ITEM
## 12 beautiful 2033.675 2604.425 0000 msajc003 Text ITEM
Examples for Medial() and End() are:
## What is the query to retrieve all word-medial syllables?
query(ae, "[Medial(Word, Syllable) == TRUE]", bundlePattern = "msajc003")
## segment list from database: ae
## query was: [Medial(Word, Syllable) == TRUE]
## labels start end session bundle level type
## 1 S 1791.425 1945.425 0000 msajc003 Syllable ITEM
## 2 W 2283.675 2361.925 0000 msajc003 Syllable ITEM
## What is the query to retrieve all word-final syllables?
query(ae, "[End(Word, Syllable) == TRUE]", bundlePattern = "msajc003")
## segment list from database: ae
## query was: [End(Word, Syllable) == TRUE]
## labels start end session bundle level type
## 1 S 256.925 674.175 0000 msajc003 Syllable ITEM
## 2 S 674.175 739.925 0000 msajc003 Syllable ITEM
## 3 S 739.925 1289.425 0000 msajc003 Syllable ITEM
## 4 W 1289.425 1463.175 0000 msajc003 Syllable ITEM
## 5 W 1463.175 1634.425 0000 msajc003 Syllable ITEM
## 6 W 1945.425 2150.175 0000 msajc003 Syllable ITEM
## 7 W 2361.925 2604.425 0000 msajc003 Syllable ITEM
Everything not being first or last element is medial:
query(ae, "[Medial(Word, Phoneme) == TRUE]", bundlePattern = "msajc003")
## segment list from database: ae
## query was: [Medial(Word, Phoneme) == TRUE]
## labels start end session bundle level type
## 1 m 256.925 340.175 0000 msajc003 Phoneme ITEM
## 2 V 340.175 426.675 0000 msajc003 Phoneme ITEM
## 3 N 426.675 483.425 0000 msajc003 Phoneme ITEM
## 4 s 483.425 566.925 0000 msajc003 Phoneme ITEM
## 5 r 892.675 949.925 0000 msajc003 Phoneme ITEM
## 6 E 949.925 1031.925 0000 msajc003 Phoneme ITEM
## 7 n 1031.925 1195.925 0000 msajc003 Phoneme ITEM
## 8 @ 1506.175 1548.425 0000 msajc003 Phoneme ITEM
## 9 @ 1715.425 1741.425 0000 msajc003 Phoneme ITEM
## 10 n 1741.425 1791.425 0000 msajc003 Phoneme ITEM
## 11 s 1791.425 1893.175 0000 msajc003 Phoneme ITEM
## 12 I 1893.175 1945.425 0000 msajc003 Phoneme ITEM
## 13 d 1945.425 1966.675 0000 msajc003 Phoneme ITEM
## 14 @ 1966.675 2033.675 0000 msajc003 Phoneme ITEM
## 15 j 2150.175 2211.175 0000 msajc003 Phoneme ITEM
## 16 u: 2211.175 2283.675 0000 msajc003 Phoneme ITEM
## 17 d 2283.675 2302.925 0000 msajc003 Phoneme ITEM
## 18 @ 2302.925 2361.925 0000 msajc003 Phoneme ITEM
## 19 f 2361.925 2447.425 0000 msajc003 Phoneme ITEM
## 20 @ 2447.425 2506.275 0000 msajc003 Phoneme ITEM
The count function’s name is Num()
. Num(x,y)
counts, how many y are in x. You can therefore ask things like the following:
## What is the query to retrieve all words that contain two syllables?
query(ae, "[Num(Text, Syllable) == 2]")
## segment list from database: ae
## query was: [Num(Text, Syllable) == 2]
## labels start end session bundle level type
## 1 amongst 187.425 674.175 0000 msajc003 Text ITEM
## 2 futile 571.925 1090.975 0000 msajc010 Text ITEM
## 3 any 1436.725 1628.475 0000 msajc010 Text ITEM
## 4 further 1628.475 1957.775 0000 msajc010 Text ITEM
## 5 shiver 1650.975 1994.975 0000 msajc012 Text ITEM
## 6 itches 299.975 662.425 0000 msajc022 Text ITEM
## 7 always 775.475 1280.175 0000 msajc022 Text ITEM
## 8 tempting 1400.675 1806.275 0000 msajc022 Text ITEM
## 9 display 666.675 1211.175 0000 msajc057 Text ITEM
## 10 attracts 1211.175 1578.675 0000 msajc057 Text ITEM
## 11 ever 2480.425 2794.925 0000 msajc057 Text ITEM
## What is the query to retrieve all syllables that contain more than four phonemes?
query(ae, "[Num(Syllable, Phoneme) > 4]", bundlePattern = "msajc003")
## segment list from database: ae
## query was: [Num(Syllable, Phoneme) > 4]
## labels start end session bundle level type
## 1 S 256.925 674.175 0000 msajc003 Syllable ITEM
## 2 S 739.925 1289.425 0000 msajc003 Syllable ITEM
query(ae,"[Num(Phoneme,Phonetic)!=1]")
## segment list from database: ae
## query was: [Num(Phoneme,Phonetic)!=1]
## labels start end session bundle level type
## 1 t 566.925 674.175 0000 msajc003 Phoneme ITEM
## 2 k 1634.425 1715.425 0000 msajc003 Phoneme ITEM
## 3 t 372.975 411.675 0000 msajc010 Phoneme ITEM
## 4 t 798.475 862.875 0000 msajc010 Phoneme ITEM
## 5 t 1090.975 1129.925 0000 msajc010 Phoneme ITEM
## 6 t 2319.475 2382.475 0000 msajc010 Phoneme ITEM
## 7 tS 379.525 546.325 0000 msajc012 Phoneme ITEM
## 8 d 1022.975 1082.975 0000 msajc012 Phoneme ITEM
## 9 k 1082.975 1183.975 0000 msajc012 Phoneme ITEM
## 10 d 1386.975 1456.475 0000 msajc012 Phoneme ITEM
## 11 t 1564.975 1620.975 0000 msajc012 Phoneme ITEM
## 12 t 2474.875 2533.975 0000 msajc012 Phoneme ITEM
## 13 d 1085.075 1129.075 0000 msajc015 Phoneme ITEM
## 14 t 1368.075 1449.475 0000 msajc015 Phoneme ITEM
## 15 r 1449.475 1500.675 0000 msajc015 Phoneme ITEM
## 16 N 1578.525 1653.675 0000 msajc015 Phoneme ITEM
## 17 w 1797.425 1903.575 0000 msajc015 Phoneme ITEM
## 18 k 2104.075 2200.875 0000 msajc015 Phoneme ITEM
## 19 w 2780.725 2876.525 0000 msajc015 Phoneme ITEM
## 20 k 2958.075 3046.125 0000 msajc015 Phoneme ITEM
## 21 tS 372.475 505.975 0000 msajc022 Phoneme ITEM
## 22 t 1400.675 1521.175 0000 msajc022 Phoneme ITEM
## 23 t 1655.675 1718.175 0000 msajc022 Phoneme ITEM
## 24 t 1806.275 1872.525 0000 msajc022 Phoneme ITEM
## 25 k 1996.275 2090.525 0000 msajc022 Phoneme ITEM
## 26 tS 2254.025 2469.525 0000 msajc022 Phoneme ITEM
## 27 dZ 708.925 819.025 0000 msajc023 Phoneme ITEM
## 28 t 1495.275 1584.925 0000 msajc023 Phoneme ITEM
## 29 k 2279.175 2366.225 0000 msajc023 Phoneme ITEM
## 30 n 475.775 543.975 0000 msajc057 Phoneme ITEM
## 31 d 666.675 729.425 0000 msajc057 Phoneme ITEM
## 32 p 863.675 970.425 0000 msajc057 Phoneme ITEM
## 33 t 1247.925 1363.925 0000 msajc057 Phoneme ITEM
## 34 m 1578.675 1709.175 0000 msajc057 Phoneme ITEM
## 35 k 1824.425 1943.175 0000 msajc057 Phoneme ITEM
## 36 t 2085.175 2148.475 0000 msajc057 Phoneme ITEM
You can use “&” to search within several attribute definitions on the same level. For example, the level Word in ae
has several attribute definitions
list_attributeDefinitions(ae,level="Word")
## name type hasLabelGroups hasLegalLabels
## 1 Word STRING FALSE FALSE
## 2 Accent STRING FALSE FALSE
## 3 Text STRING FALSE FALSE
We could, therefore, look for all accented (“S”) words by …
query(ae, "[Text =~.* & Accent == S]")
## segment list from database: ae
## query was: [Text =~.* & Accent == S]
## labels start end session bundle level type
## 1 amongst 187.425 674.175 0000 msajc003 Text ITEM
## 2 friends 739.925 1289.425 0000 msajc003 Text ITEM
## 3 beautiful 2033.675 2604.425 0000 msajc003 Text ITEM
## 4 futile 571.925 1090.975 0000 msajc010 Text ITEM
## 5 further 1628.475 1957.775 0000 msajc010 Text ITEM
## 6 resistance 1957.775 2753.975 0000 msajc010 Text ITEM
## 7 chill 379.525 744.525 0000 msajc012 Text ITEM
## 8 wind 744.525 1082.975 0000 msajc012 Text ITEM
## 9 caused 1082.975 1456.475 0000 msajc012 Text ITEM
## 10 shiver 1650.975 1994.975 0000 msajc012 Text ITEM
## 11 violently 1994.975 2692.325 0000 msajc012 Text ITEM
## 12 emphasized 425.375 1129.075 0000 msajc015 Text ITEM
## 13 strengths 1213.075 1797.425 0000 msajc015 Text ITEM
## 14 concealing 2104.075 2693.675 0000 msajc015 Text ITEM
## 15 weaknesses 2780.725 3456.825 0000 msajc015 Text ITEM
## 16 itches 299.975 662.425 0000 msajc022 Text ITEM
## 17 always 775.475 1280.175 0000 msajc022 Text ITEM
## 18 so 1113.675 1400.675 0000 msajc022 Text ITEM
## 19 tempting 1400.675 1806.275 0000 msajc022 Text ITEM
## 20 scratch 1890.275 2469.525 0000 msajc022 Text ITEM
## 21 no 1774.925 1964.425 0000 msajc023 Text ITEM
## 22 risks 1964.425 2554.175 0000 msajc023 Text ITEM
## 23 display 666.675 1211.175 0000 msajc057 Text ITEM
## 24 more 1578.675 1824.425 0000 msajc057 Text ITEM
## 25 ever 2480.425 2794.925 0000 msajc057 Text ITEM
Another usage of “&” is to combine a basic query with a function, e.g.
## What is the query to retrieve all non-word-final “S” syllables?
query(ae, "[[Syllable == S & End(Word, Syllable) == FALSE]^#Text=~.*]", bundlePattern = "msajc003")
## segment list from database: ae
## query was: [[Syllable == S & End(Word, Syllable) == FALSE]^#Text=~.*]
## labels start end session bundle level type
## 1 considered 1634.425 2150.175 0000 msajc003 Text ITEM
## 2 beautiful 2033.675 2604.425 0000 msajc003 Text ITEM