The aim is to get from a Praat .TextGrid
to an Emu database format as exemplified by Fig. 1.1:
Figure 1.1: An utterance fragment in Praat and in Emu.
The assumption is that you have a project called ips
and that it contains the following directories.
If not, please see preliminaries here.
Start up R in the project you are using for this course.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'emuR'
##
## The following object is masked from 'package:base':
##
## norm
In R, store the path to the directory testsample
as sourceDir
in exactly the following way:
And also store in R the path to emu_databases as targetDir
:
The directory testsample/praat
on your computer contains a Praat style database with .wav
files and .Textgrid
files. Define the path to this database in R and check you can see these files with the list.files()
function:
## [1] "wetter1.pit" "wetter1.TextGrid" "wetter1.wav"
## [4] "wetter10.pit" "wetter10.TextGrid" "wetter10.wav"
## [7] "wetter11.pit" "wetter11.TextGrid" "wetter11.wav"
## [10] "wetter12.pit" "wetter12.TextGrid" "wetter12.wav"
## [13] "wetter13.pit" "wetter13.TextGrid" "wetter13.wav"
## [16] "wetter14.pit" "wetter14.TextGrid" "wetter14.wav"
## [19] "wetter15.pit" "wetter15.TextGrid" "wetter15.wav"
## [22] "wetter16.pit" "wetter16.TextGrid" "wetter16.wav"
## [25] "wetter17.pit" "wetter17.TextGrid" "wetter17.wav"
## [28] "wetter2.pit" "wetter2.TextGrid" "wetter2.wav"
## [31] "wetter3.pit" "wetter3.TextGrid" "wetter3.wav"
## [34] "wetter4.pit" "wetter4.TextGrid" "wetter4.wav"
## [37] "wetter6.pit" "wetter6.TextGrid" "wetter6.wav"
## [40] "wetter7.pit" "wetter7.TextGrid" "wetter7.wav"
The emuR function for converting the TextGridCollection to an Emu database and then storing the latter in targetDir
(defined above) is convert_TextGridCollection()
. It works like this:
The converted Praat database can now be loaded:
## INFO: Loading EMU database from ./emu_databases/praat_emuDB... (14 bundles found)
##
|
| | 0%
|
|===== | 7%
|
|========== | 14%
|
|=============== | 21%
|
|==================== | 29%
|
|========================= | 36%
|
|============================== | 43%
|
|=================================== | 50%
|
|======================================== | 57%
|
|============================================= | 64%
|
|================================================== | 71%
|
|======================================================= | 79%
|
|============================================================ | 86%
|
|================================================================= | 93%
|
|======================================================================| 100%
and its properties examined as before:
##
## ── Summary of emuDB ────────────────────────────────────────────────────────────
## Name: praat
## UUID: c56238e2-8b60-4aab-b306-7ab6da2d8713
## Directory: /Users/jmh/Desktop/ipsR/emu_databases/praat_emuDB
## Session count: 1
## Bundle count: 14
## Annotation item count: 214
## Label count: 214
## Link count: 0
##
## ── Database configuration ──────────────────────────────────────────────────────
##
## ── SSFF track definitions ──
##
## data frame with 0 columns and 0 rows
## ── Level definitions ──
## name type nrOfAttrDefs attrDefNames
## ORT SEGMENT 1 ORT;
## ── Link definitions ──
## data frame with 0 columns and 0 rows
And it can of course be viewed:
wrassp
The task is to calculate the pitch from each of the utterance’s waveforms for the praat_emuDB
database created above. First, find the full path names of all of the .wav
files. They are here:
praat_wav_paths = list.files(path.praat,
pattern = ".*wav$",
recursive = T,
full.names = T)
praat_wav_paths
## [1] "./testsample/praat/wetter1.wav" "./testsample/praat/wetter10.wav"
## [3] "./testsample/praat/wetter11.wav" "./testsample/praat/wetter12.wav"
## [5] "./testsample/praat/wetter13.wav" "./testsample/praat/wetter14.wav"
## [7] "./testsample/praat/wetter15.wav" "./testsample/praat/wetter16.wav"
## [9] "./testsample/praat/wetter17.wav" "./testsample/praat/wetter2.wav"
## [11] "./testsample/praat/wetter3.wav" "./testsample/praat/wetter4.wav"
## [13] "./testsample/praat/wetter6.wav" "./testsample/praat/wetter7.wav"
The signal processing package wrassp
will now be used to calculate the pitch for each of these .wav
files. To see the full range of signal processing routines available, enter:
There are two possible routines that are needed here for calculating pitch: ksvF0
and mhsF0
.
Here’s how to use mhsF0
with the default settings. The output is going to be stored in path.praat
(i.e. in testsample/praat
on your computer).
##
## INFO: applying mhspitch to 14 files
##
|
| | 0%
|
|===== | 7%
|
|========== | 14%
|
|=============== | 21%
|
|==================== | 29%
|
|========================= | 36%
|
|============================== | 43%
|
|=================================== | 50%
|
|======================================== | 57%
|
|============================================= | 64%
|
|================================================== | 71%
|
|======================================================= | 79%
|
|============================================================ | 86%
|
|================================================================= | 93%
|
|======================================================================| 100%
As the figure below shows, the pitch files should now all have been dumped in path.praat
, i.e. in testsample/praat
.
These calculated pitch files now need to be added to praat_emuDB
. This is done with the add_files()
function. The parameter targetSessionName
can be omitted in this case, because all of the bundles are stored in the session directory 0000
. This can be verified with:
## # A tibble: 14 × 2
## session name
## <chr> <chr>
## 1 0000 wetter1
## 2 0000 wetter10
## 3 0000 wetter11
## 4 0000 wetter12
## 5 0000 wetter13
## 6 0000 wetter14
## 7 0000 wetter15
## 8 0000 wetter16
## 9 0000 wetter17
## 10 0000 wetter2
## 11 0000 wetter3
## 12 0000 wetter4
## 13 0000 wetter6
## 14 0000 wetter7
Now add the pitch files to praat_DB
:
Having added the files, they need to be defined. The information required is:
track name
. This can be anything and it is needed when referring to these signal files in R.file extension
. This is pit
as already established above.columnName
. This is the name of the column in the .pit
files in which the fundamental frequency data is stored. This type of information (as well as information about the extension) is given by wrasspOutputInfos
. In this case, append $mhsF0
since this was the name of the signal processing routine that has been used to calculate the pitch data:## $ext
## [1] "pit"
##
## $tracks
## [1] "pitch"
##
## $outputType
## [1] "SSFF"
The column name is given by $tracks
which in this case is pitch
. Putting all this together, and using "pitch"
for the name of the track gives:
add_ssffTrackDefinition(praat_DB,
name = "pitch",
columnName = "pitch",
fileExtension = "pit")
summary(praat_DB)
## ── Summary of emuDB ────────────────────────────────────────────────────────────
## Name: praat
## UUID: c56238e2-8b60-4aab-b306-7ab6da2d8713
## Directory: /Users/jmh/Desktop/ipsR/emu_databases/praat_emuDB
## Session count: 1
## Bundle count: 14
## Annotation item count: 214
## Label count: 214
## Link count: 0
##
## ── Database configuration ──────────────────────────────────────────────────────
##
## ── SSFF track definitions ──
##
## name columnName fileExtension
## pitch pitch pit
## ── Level definitions ──
## name type nrOfAttrDefs attrDefNames
## ORT SEGMENT 1 ORT;
## ── Link definitions ──
## data frame with 0 columns and 0 rows
The signals that are currently displayed for this praat_DB
database can be seen with the function get_signalCanvasesOrder()
as follows:
## [1] "OSCI" "SPEC"
which confirms that what is seen when viewing the database with the serve()
function is the waveform (OSCI
) and the spectrogram. The pitch data created above now needs to be added using the function set_signalCanvasesOrder()
. The second argument should always be "default"
, thus:
The next task is to add an event tier that can be used for labelling tones. Here the tier is called “Tone”. So far, the only existing time tier is ORT
as confirmed by:
## name type nrOfAttrDefs attrDefNames
## 1 ORT SEGMENT 1 ORT;
In order to add a new tier called Tone
as an EVENT
tier:
## INFO: Rewriting 14 _annot.json files to file system...
##
|
| | 0%
|
|===== | 7%
|
|========== | 14%
|
|=============== | 21%
|
|==================== | 29%
|
|========================= | 36%
|
|============================== | 43%
|
|=================================== | 50%
|
|======================================== | 57%
|
|============================================= | 64%
|
|================================================== | 71%
|
|======================================================= | 79%
|
|============================================================ | 86%
|
|================================================================= | 93%
|
|======================================================================| 100%
Display Tone
so that it is above the ORT
tier and so directly underneath the signals:
## [1] "ORT"
Add two tone labels H* at pitch peak of morgens and ruhig in wetter1
as in Fig. 1.1 and save the result.
The tones are to be linked to words within which they occur in time. To do this, define a hierarchical relationship such that ORT
dominates Tone
:
## NULL
add_linkDefinition(praat_DB,
type = "ONE_TO_MANY",
superlevelName = "ORT",
sublevelName = "Tone")
list_linkDefinitions(praat_DB)
## type superlevelName sublevelName
## 1 ONE_TO_MANY ORT Tone
Inspect the hierarchy:
## ── Summary of emuDB ────────────────────────────────────────────────────────────
## Name: praat
## UUID: c56238e2-8b60-4aab-b306-7ab6da2d8713
## Directory: /Users/jmh/Desktop/ipsR/emu_databases/praat_emuDB
## Session count: 1
## Bundle count: 14
## Annotation item count: 214
## Label count: 214
## Link count: 0
##
## ── Database configuration ──────────────────────────────────────────────────────
##
## ── SSFF track definitions ──
##
## name columnName fileExtension
## pitch pitch pit
## ── Level definitions ──
## name type nrOfAttrDefs attrDefNames
## ORT SEGMENT 1 ORT;
## Tone EVENT 1 Tone;
## ── Link definitions ──
## type superlevelName sublevelName
## ONE_TO_MANY ORT Tone
This makes use of the autobuild_linkFromTimes()
function in order to link the tones to the corresponding words:
## INFO: Rewriting 14 _annot.json files to file system...
##
|
| | 0%
|
|===== | 7%
|
|========== | 14%
|
|=============== | 21%
|
|==================== | 29%
|
|========================= | 36%
|
|============================== | 43%
|
|=================================== | 50%
|
|======================================== | 57%
|
|============================================= | 64%
|
|================================================== | 71%
|
|======================================================= | 79%
|
|============================================================ | 86%
|
|================================================================= | 93%
|
|======================================================================| 100%