.wav
filesIf you have not already done so, please follow the setup instructions.
Assuming that the above setup instructions have been followed, now download and unzip testsample in the directory where you have located your project. I will assume that this directory name (without the path) is ipsR
as in setup section 5.
Create a directory called emu_databases on your computer and put it into the ipsR
directory which should now look like this:
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr 1.1.3 âś” readr 2.1.4
## âś” forcats 1.0.0 âś” stringr 1.5.0
## âś” ggplot2 3.4.3 âś” tibble 3.2.1
## âś” lubridate 1.9.3 âś” tidyr 1.3.0
## âś” purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'emuR'
##
## The following object is masked from 'package:base':
##
## norm
In R, store the path to the directory testsample
as sourceDir
in exactly the following way:
And also store in R the path to emu_databases as targetDir
:
This assumes you have downloaded testsample and created sourceDir
and targetDir
as defined earlier.
Store the path to the directory german:
Check which files are in this database:
## [1] "K01BE001.txt" "K01BE001.wav" "K01BE002.txt" "K01BE002.wav"
List only the full path of the wav files:
## [1] "./testsample/german/K01BE001.wav" "./testsample/german/K01BE002.wav"
Create an empty database:
Load the database and store its characteristics:
What’s in the database? Nothing!
##
## ── Summary of emuDB ────────────────────────────────────────────────────────────
## Name: german
## UUID: c9d0ab4e-f9bc-45af-a992-059ac72c1ac8
## Directory: /Users/jmh/Desktop/ipsR/emu_databases/german_emuDB
## Session count: 0
## Bundle count: 0
## Annotation item count: 0
## Label count: 0
## Link count: 0
##
## ── Database configuration ──────────────────────────────────────────────────────
##
## ── SSFF track definitions ──
##
## data frame with 0 columns and 0 rows
## ── Level definitions ──
## data frame with 0 columns and 0 rows
## ── Link definitions ──
## data frame with 0 columns and 0 rows
Import wav files into this empty database:
## INFO: Importing 2 media files...
##
|
| | 0%
|
|=================================== | 50%
|
|======================================================================| 100%
Inspect the database again:
## ── Summary of emuDB ────────────────────────────────────────────────────────────
## Name: german
## UUID: c9d0ab4e-f9bc-45af-a992-059ac72c1ac8
## Directory: /Users/jmh/Desktop/ipsR/emu_databases/german_emuDB
## Session count: 1
## Bundle count: 2
## Annotation item count: 0
## Label count: 0
## Link count: 0
##
## ── Database configuration ──────────────────────────────────────────────────────
##
## ── SSFF track definitions ──
##
## data frame with 0 columns and 0 rows
## ── Level definitions ──
## data frame with 0 columns and 0 rows
## ── Link definitions ──
## data frame with 0 columns and 0 rows
List available bundles. (See section 5.1 of the Emu SDMS for details about sessions and bundles).
## # A tibble: 2 Ă— 2
## session name
## <chr> <chr>
## 1 0000 K01BE001
## 2 0000 K01BE002
Display the wav and spectrogram data for these two utterances.
Add a tier for phonetic labelling. The tier needs a name (here called Phon
). The type is SEGMENT
because the annotation units have start and end times. The other possibilities are EVENT
(for e.g., marking tone targets) and ITEM
. Please see section 4 of the Emu SDMS manual for the important difference between SEGMENT
, EVENT
, and ITEM
tiers.
## INFO: Rewriting 2 _annot.json files to file system...
##
|
| | 0%
|
|=================================== | 50%
|
|======================================================================| 100%
Set things up to display the newly created Phon
tier.
## NULL
The above gives NULL
because no time tiers (of type SEGMENT
or EVENT
) have been specified for display. Set to display the Phon
tier.
Now try:
The Phon
tier should be visible. Annotate some segments as in the figure below, e.g. /O, a/ for the vowels of Sonne and lacht in K01BE002
and save the annotations. Please see section 9.2.1 of the EMU-SDMS manual on how to annotate from time signals.
Figure 3.1: An utterance with two annotations from german_DB
.
Some information about annotating in Emu
The procedure for entering annotations into Emu is not immediately intuitive especially if you are used to Praat.
One way to get started is put the mouse at some desired point in the waveform or spectrogram window, then left click, then carriage return. This should draw a vertical line. Let’s call the time of this line \(t_1\).
Hold the shift key down and while keeping the shift key held down sweep somewhere to the right to a later point in time that we will call \(t_2\). Keep holding the shift key down while left clicking with the mouse at your desired time (\(t_2\)). Now let go of the shift key and enter carriage return. These actions should have made a segment between times \(t_1\) and \(t_2\).
Making other segments is now much easier. Just put the mouse at the desired point on the waveform/spectrogram and left click followed by carriage return. If the position of this new mouse location was after \(t_2\) (at say \(t_3\)), it will make a segment from \(t_2\) to \(t_3\). If the mouse was before \(t_1\) (at say \(t_0\)), then the new segment will be from \(t_0\) to \(t_1\).
To delete a segment, put the mouse over a segment boundary (at a vertical line, if you can see it) and enter backspace. If you did this at time \(t_1\), and had created \(t_0\) as above, then the segment whose original time was from \(t_1\) to \(t_2\) is now from \(t_0\) to \(t_2\).
To enter a label, put the mouse on the Phon
tier inside a pair of segment boundaries and enter carriage return. The segment on the Phon
tier will now show up bright yellow. Type in some text and enter carriage return to make the annotation.
You can move any boundary by hovering the mouse over the boundary, holding down the shift key and moving the mouse from left to right.
If you don’t want to save the annotations, just click on another utterance name. Otherwise, left click on the current utterance name which is now highlighted in red.
Assuming you have managed to annotate the segments as in Fig. 3.1 above, the segments should now be accessible in R with query()
:
The next task is to add orthographic labels as ITEM
annotations. This should be done if either (a) the start and end times are of no concern and/or (b) a word’s start and end time are inherited from segments. There are three steps:
Define the tier:
## INFO: Rewriting 2 _annot.json files to file system...
##
|
| | 0%
|
|=================================== | 50%
|
|======================================================================| 100%
Define how it is linked with a time-based tier (here with Phon
). See section 4 of the Emu SDMS manual for the difference between ONE-TO-MANY
and MANY-TO-MANY
.
Check the hierarchical links:
## type superlevelName sublevelName
## 1 ONE_TO_MANY ORT Phon
Open the hierarchy window for utterance K01BE001
, click on the blue and white +
sign next to ORT
. Each time you do so, a node appears. You can enter annotation text for any node by positioning the mouse over it and then left click to bring up a green rectangle (see figure below) into which you can enter text, followed by carriage return. To delete a node, move the mouse over it and enter y
. Further details: see figure below.
If you have done something as in the above figure, you should be able to access the annotations in R. Notice the NA
under start and end times. This is because they are timeless i.e. unlinked to any annotations of a time (SEGMENT
or EVENT
) tier.
You can also add timeless annotations (in this case to the ORT
tier) with the function create_itemsInLevel()
after making an appropriate data-frame in R. In this example, the words die Sonne lacht will be added to the second utterance.
# these are the words of the second sentence stored as a character vector `w`
w = c("die", "Sonne", "lacht")
Make a data-frame with the following information:
What session do the labels belong to?
What bundle do the labels belong to?
What’s the name of the tier? This will be ORT
as created above.
What order do the annotations occur in? This is 1, 2, 3 for die Sonne lacht.
Put all the above information into a data-frame as follows:
newItems_ORT = data.frame(session = sess,
bundle = bundle,
level = lev,
start_item_seq_idx = inds,
attribute = lev,
labels = w,
stringsAsFactors = F)
newItems_ORT
## session bundle level start_item_seq_idx attribute labels
## 1 0000 K01BE002 ORT 1 ORT die
## 2 0000 K01BE002 ORT 2 ORT Sonne
## 3 0000 K01BE002 ORT 3 ORT lacht
Add these word annotations to the database:
## INFO: Rewriting 2 _annot.json files to file system...
##
|
| | 0%
|
|=================================== | 50%
|
|======================================================================| 100%
Look at hierarchy for the second utterance. The word annotations in the ORT
tier should now be visible. These can be linked manually in the Emu-WebApp so that the word labels are accessible in EmuR. Please see 9.2.2 of the Emu SDMS manual for details on how to annotate hierarchically.
Some information about annotating hierarchically
Adding hierarchical links and annotations is not difficult. The present task is to add links from O
and from a
at the Phon
tier to Sonne
and to lacht
respectively at the ORT
tier. To do this for the first of these, hover the mouse over Sonne
(the node will turn blue), hold down the shift key, and sweep the mouse to O
(whose node will also turn blue), release the shift key, and the link is made. If you want to delete the link, hover the mouse over it (the link will then turn bright yellow) and hit backspace. You can add new nodes at the ORT
tier as follows. Move the mouse over lacht
, then enter n
and carriage return. This will create a new node before lacht
(between Sonne
and lacht
). If you enter m
instead of n
in the above operation, a new node will be created after lacht
. To edit or play a node at the ORT
tier, left click on the node. You can enter or modify text in the green panel. To denote a node at the ORT
tier, hover over it and enter y
. See figure below.
If you have annotated as in the above figure and saved it, the words and their times will be accessible (note that the times at the ORT
tier and the same as the times at the Phon
tier, because each word only dominates one segment (and inherits its times from those). Note also that die
has no times, because it hasn’t been linked to any annotations at the Phon
tier.
# get all annotations at the `ORT` tier for bundle `K01BE002`
query(german_DB, "ORT =~ .*", bundlePattern = "K01BE002")