--- title: "Converting a Praat TextGrid collection" author: "Jonathan Harrington" date: "WiSe 2021" output: bookdown::html_document2: number_sections: TRUE toc: true theme: flatly highlight: pygments --- ```{r, include=FALSE} targetDir = "./emu_databases" if (dir.exists(file.path(targetDir, "praat_emuDB"))) { unlink(file.path(targetDir, "praat_emuDB"), recursive = TRUE) } ``` # Objective The aim is to get from a Praat `.TextGrid` to an Emu database format as exemplified by Fig. \@ref(fig:figpraat): ```{r figpraat, fig.align="center", fig.cap="An utterance fragment in Praat and in Emu.", echo=FALSE} knitr::include_graphics("./img/figpraat.png") ``` # Preliminaries and starting up R The assumption is that you have a project called `emu2021` and that it contains the following directories. ![](img/emu2021.png) If not, please see preliminaries [here](https://www.phonetik.uni-muenchen.de/~jmh/lehre/sem/ws2122/Emuintro/creating_database.html#preliminaries). Start up R in the project you are using for this course. ```{r} library(tidyverse) library(emuR) library(wrassp) ``` In R, store the path to the directory `testsample` as `sourceDir` in exactly the following way: ```{r} sourceDir = "./testsample" ``` And also store in R the path to *emu_databases* as `targetDir`: ```{r} targetDir = "./emu_databases" ``` # Converting Praat TextGrids The directory `testsample/praat` on your computer contains a Praat style database with `.wav` files and `.Textgrid` files. Define the path to this database in R and check you can see these files with the `list.files()` function: ```{r} path.praat = file.path(sourceDir, "praat") list.files(path.praat) ``` The emuR function for converting the TextGridCollection to an Emu database and then storing the latter in `targetDir` (defined above) is `convert_TextGridCollection()`. It works like this: ```{r} convert_TextGridCollection(path.praat, dbName = "praat", targetDir = targetDir) ``` The converted Praat database can now be loaded: ```{r} praat_DB = load_emuDB(file.path(targetDir, "praat_emuDB")) ``` and its properties examined as before: ```{r} summary(praat_DB) ``` And it can of course be viewed: ```{r, eval=FALSE} serve(praat_DB, useViewer = F) ``` # Calculating pitch with `wrassp` The task is to calculate the pitch from each of the utterance's waveforms for the `praat_emuDB` database created above. First, find the full path names of all of the `.wav` files. They are here: ```{r} praat_wav_paths = list.files(path.praat, pattern = ".*wav$", recursive = T, full.names = T) praat_wav_paths ``` The signal processing package `wrassp` will now be used to calculate the pitch for each of these `.wav` files. To see the full range of signal processing routines available, enter: ```{r, eval=FALSE} ?wrassp ``` There are two possible routines that are needed here for calculating pitch: `ksvF0` and `mhsF0`. Here's how to use `mhsF0` with the default settings. The output is going to be stored in `path.praat` (i.e. in `testsample/praat` on your computer). ```{r} mhsF0(praat_wav_paths, outputDirectory = path.praat) ``` As the figure below shows, the pitch files should now all have been dumped in `path.praat`, i.e. in `testsample/praat`. ![](img/pitchfiles.png) # Adding the calculated pitch files to the database These calculated pitch files now need to be added to `praat_emuDB`. This is done with the `add_files()` function. The parameter `targetSessionName` can be omitted in this case, because all of the bundles are stored in the session directory `0000`. This can be verified with: ```{r} list_bundles(praat_DB) ``` Now add the pitch files to `praat_DB`: ```{r} add_files(praat_DB, dir = path.praat, fileExtension = "pit", targetSessionName = "0000") ``` Having added the files, they need to be *defined*. The information required is: - a `track name`. This can be anything and it is needed when referring to these signal files in R. - the `file extension`. This is `pit` as already established above. - the `columnName`. This is the name of the column in the `.pit` files in which the fundamental frequency data is stored. This type of information (as well as information about the extension) is given by `wrasspOutputInfos`. In this case, append `$mhsF0` since this was the name of the signal processing routine that has been used to calculate the pitch data: ```{r} wrasspOutputInfos$mhsF0 ``` The column name is given by `$tracks` which in this case is `pitch`. Putting all this together, and using `"pitch"` for the name of the track gives: ```{r} add_ssffTrackDefinition(praat_DB, name = "pitch", columnName = "pitch", fileExtension = "pit") summary(praat_DB) ``` # Displaying the pitch files in the webapp The signals that are currently displayed for this `praat_DB` database can be seen with the function `get_signalCanvasesOrder()` as follows: ```{r} get_signalCanvasesOrder(praat_DB, perspectiveName = "default") ``` which confirms that what is seen when viewing the database with the `serve()` function is the waveform (`OSCI`) and the spectrogram. The pitch data created above now needs to be added using the function `set_signalCanvasesOrder()`. The second argument should always be `"default"`, thus: ```{r, eval=FALSE} set_signalCanvasesOrder(praat_DB, perspectiveName = "default", order = c("OSCI", "SPEC", "pitch")) serve(praat_DB, useViewer = F) ``` # Adding an event tier The next task is to add an event tier that can be used for labelling tones. Here the tier is called "Tone". So far, the only existing time tier is `ORT` as confirmed by: ```{r} list_levelDefinitions(praat_DB) ``` In order to add a new tier called `Tone` as an `EVENT` tier: ```{r} add_levelDefinition(praat_DB, "Tone", "EVENT") ``` Display `Tone` so that it is above the `ORT` tier and so directly underneath the signals: ```{r} get_levelCanvasesOrder(praat_DB, perspectiveName = "default") set_levelCanvasesOrder(praat_DB, perspectiveName = "default", order = c("Tone", "ORT")) ``` # Labelling some tones Add two tone labels H* at pitch peak of *morgens* and *ruhig* in `wetter1` as in Fig. \@ref(fig:figpraat) and save the result. ```{r, eval=FALSE} serve(praat_DB, useViewer=F) ``` The tones are to be linked to words within which they occur in time. To do this, define a hierarchical relationship such that `ORT` dominates `Tone`: ```{r} list_linkDefinitions(praat_DB) add_linkDefinition(praat_DB, type = "ONE_TO_MANY", superlevelName = "ORT", sublevelName = "Tone") list_linkDefinitions(praat_DB) ``` Inspect the hierarchy: ```{r} summary(praat_DB) ``` ```{r, eval=FALSE} # switch to hierarchy view serve(praat_DB, useViewer = F) ``` # Automatically linking event and segment times This makes use of the `autobuild_linkFromTimes()` function in order to link the tones to the corresponding words: ```{r} autobuild_linkFromTimes(praat_DB, superlevelName = "ORT", sublevelName = "Tone") ``` ```{r, eval=FALSE} # switch to hierarchy view serve(praat_DB, useViewer = F) ```