emuR has for historical reasons some specialized objects and specialized methods that allow working with these specialized emuR objects. While it is sometimes unavoidable to have such specialized objects and methods, it should be avoided to do so whenever possible - instead, we could use some standardized procedures that are very common in R.
In order to see the advantages of a more standardized procedure, let us one again create a temporary EMU-SDMS-database first (using, by the way, some specialized (but unavoidable) commands from the package emuR):
# load package
library(emuR)
# create demo data in directory
# provided by tempdir()
create_emuRdemoData(dir = tempdir())
# create path to demo database
path2ae = file.path(tempdir(), "emuR_demoData", "ae_emuDB")
# load database
ae = load_emuDB(path2ae, verbose = F)
As we have seen in chapter 06, the default resulting object of a call to get_trackdata() is of class trackdata, which is a very special class only existing in the package emuR (and its predecessors). The emuR package provides multiple specialized routines such as dcut(), trapply(), eplot and dplot() for processing and visually inspect objects of this type (see Harrington, 2010, for the use of these functions).
vowels = query(ae,query="Phonetic==i:|u:|E")
vowels_fm = get_trackdata(ae,
seglist = vowels,
ssffTrackName = "fm",
verbose = FALSE)
# show class of vowels_fm
class(vowels_fm)
## [1] "trackdata"
The folloing command then extracts the formant values at the temporal midpoint of each segment (each vowel, in this case):
vowels_fm05=dcut(vowels_fm,.5,prop = TRUE)
We can then use this object to plot the data and 95%-confidence ellipses.
eplot(vowels_fm05[,1:2],label(vowels),centroid=TRUE,formant = TRUE)
The original emutrack trackdata object can be used to plot trajectories of formants (here: F2 only) as a function of time (first example) or a mean trajectory for each vowel categories’ F2 as a function of normalized time (second example)
dplot(vowels_fm[,2],label(vowels))
dplot(vowels_fm[,2],label(vowels),normalise=TRUE,average=TRUE)
These commands (and many other commands in the predecessors of emuR) are specialized to work with (and only with) emutrack data objects.
In most cases, however, a R user will store his data in data.frames. Data.frames are required for most commands in most packages concerned with plotting and/or statistical analyses.
As the emutrack trackdata object is a fairly complex nested matrix object with internal reference matrices, which can be cumbersome to work with, the emuR package introduces a new equivalent object type called emuRtrackdata that essentially is a flat data.frame or data.table object. This object type can be retrieved by setting the resultType parameter of the get trackdata() function to emuRtrackdata:
vowels_fm_new = get_trackdata(ae,
seglist = vowels,
ssffTrackName = "fm",
resultType="emuRtrackdata",
verbose = FALSE)
# show class of vowels_fm_new
class(vowels_fm_new)
## [1] "emuRtrackdata" "data.table" "data.frame"
vowels_fm_new
## sl_rowIdx labels start end utts
## 1: 1 E 949.925 1031.925 0000:msajc003
## 2: 1 E 949.925 1031.925 0000:msajc003
## 3: 1 E 949.925 1031.925 0000:msajc003
## 4: 1 E 949.925 1031.925 0000:msajc003
## 5: 1 E 949.925 1031.925 0000:msajc003
## ---
## 295: 18 E 2480.425 2587.675 0000:msajc057
## 296: 18 E 2480.425 2587.675 0000:msajc057
## 297: 18 E 2480.425 2587.675 0000:msajc057
## 298: 18 E 2480.425 2587.675 0000:msajc057
## 299: 18 E 2480.425 2587.675 0000:msajc057
## db_uuid session bundle start_item_id
## 1: 0fc618dc-8980-414d-8c7a-144a649ce199 0000 msajc003 157
## 2: 0fc618dc-8980-414d-8c7a-144a649ce199 0000 msajc003 157
## 3: 0fc618dc-8980-414d-8c7a-144a649ce199 0000 msajc003 157
## 4: 0fc618dc-8980-414d-8c7a-144a649ce199 0000 msajc003 157
## 5: 0fc618dc-8980-414d-8c7a-144a649ce199 0000 msajc003 157
## ---
## 295: 0fc618dc-8980-414d-8c7a-144a649ce199 0000 msajc057 200
## 296: 0fc618dc-8980-414d-8c7a-144a649ce199 0000 msajc057 200
## 297: 0fc618dc-8980-414d-8c7a-144a649ce199 0000 msajc057 200
## 298: 0fc618dc-8980-414d-8c7a-144a649ce199 0000 msajc057 200
## 299: 0fc618dc-8980-414d-8c7a-144a649ce199 0000 msajc057 200
## end_item_id level start_item_seq_idx end_item_seq_idx type
## 1: 157 Phonetic 11 11 SEGMENT
## 2: 157 Phonetic 11 11 SEGMENT
## 3: 157 Phonetic 11 11 SEGMENT
## 4: 157 Phonetic 11 11 SEGMENT
## 5: 157 Phonetic 11 11 SEGMENT
## ---
## 295: 200 Phonetic 39 39 SEGMENT
## 296: 200 Phonetic 39 39 SEGMENT
## 297: 200 Phonetic 39 39 SEGMENT
## 298: 200 Phonetic 39 39 SEGMENT
## 299: 200 Phonetic 39 39 SEGMENT
## sample_start sample_end sample_rate times_rel times_orig T1 T2
## 1: 18999 20638 20000 0 952.5 422 1613
## 2: 18999 20638 20000 5 957.5 434 1651
## 3: 18999 20638 20000 10 962.5 447 1686
## 4: 18999 20638 20000 15 967.5 449 1703
## 5: 18999 20638 20000 20 972.5 445 1712
## ---
## 295: 49609 51753 20000 85 2567.5 440 1564
## 296: 49609 51753 20000 90 2572.5 428 1515
## 297: 49609 51753 20000 95 2577.5 400 1470
## 298: 49609 51753 20000 100 2582.5 348 1422
## 299: 49609 51753 20000 105 2587.5 278 1376
## T3 T4
## 1: 2118 2750
## 2: 2195 2824
## 3: 2229 3536
## 4: 2245 3536
## 5: 2275 3224
## ---
## 295: 2345 3275
## 296: 2308 3217
## 297: 2286 3203
## 298: 2260 3214
## 299: 2232 3274
names(vowels_fm_new)
## [1] "sl_rowIdx" "labels" "start"
## [4] "end" "utts" "db_uuid"
## [7] "session" "bundle" "start_item_id"
## [10] "end_item_id" "level" "start_item_seq_idx"
## [13] "end_item_seq_idx" "type" "sample_start"
## [16] "sample_end" "sample_rate" "times_rel"
## [19] "times_orig" "T1" "T2"
## [22] "T3" "T4"
The emuRtrackdata object is an amalgamation of both a segment list and a trackdata object. The first sl_rowIdx column of the iVu object indicates the row index of the segment list the current row belongs to, the times_rel and times_orig (and times_norm in the forthcoming emuR-version) columns represent the relative time and the original time of the samples contained in the current row and T1 (to Tn in n dimensional trackdata) contains the actual signal sample values. It is also worth noting that the emuR package provides a function called create emuRtrackdata(), which allows users to create emuRtrackdata from a segment list and a trackdata object. This is beneficial as it allows trackdata objects to be processed using functions provided by the emuR package (e.g., dcut() and trapply()) and then converts them into a standardized data.table object for further processing (e.g., using R packages such as lme4 or ggplot2 which were implemented to use with data.frame or data.table objects).
ggplot2The goal of this chapter is to allow the reader to plot any numeric data from data.frames, whatever their source may be, including the new emuRtrackdata object. In order to do so, we sometimes have to manipulate the data.frame. We therefore will repeat some standard methods that manipulate data.frames.
The plots above can be done with ggplot2 and will look like:
Figure 1: Equivalent to the eplot
Figure 2: Equivalent to the dplot
Figure 3: Equivalent to the normalized dplot
ggplot2?Advantages of ggplot2
grammar of graphics (Wilkinson, 2005)That said, there are some things you cannot (or should not) do With ggplot2:
The basic idea: independently specify plot building blocks and combine them to create just about any kind of graphical display you want. Building blocks of a graph include:
ggplotThe ggplot() function is used to initialize the basic graph structure, then we add to it. The structure of a ggplot looks like this:
ggplot(data = <default data set>,
aes(x = <default x axis variable>,
y = <default y axis variable>,
... <other default aesthetic mappings>),
... <other plot defaults>) +
geom_<geom type>(aes(size = <size variable for this geom>,
... <other aesthetic mappings>),
data = <data for this point geom>,
stat = <statistic string or function>,
position = <position string or function>,
color = <"fixed color specification">,
<other arguments, possibly passed to the _stat_ function) +
scale_<aesthetic>_<type>(name = <"scale label">,
breaks = <where to put tick marks>,
labels = <labels for tick marks>,
... <other options for the scale>) +
theme(plot.background = element_rect(fill = "gray"),
... <other theme elements>)
The basic idea is that you specify different parts of the plot, and add them together using the + operator.
See e.g. Handbook on R and figures: http://www.cookbook-r.com/ and the introduction to ggplot2 (gg = grammar of graphics) in http://docs.ggplot2.org/current/
Let’s try with a few datasets from Jonathan Harrington’s statistics seminar:
# if necessary, install.packages(ggplot2)
library(ggplot2)
pfadu = "http://www.phonetik.uni-muenchen.de/~jmh/lehre/Rdf"
asp = read.table(file.path(pfadu, "asp.txt"))
coronal = read.table(file.path(pfadu, "coronal.txt"))
int.df = read.table(file.path(pfadu, "intdauer.txt"))
v.df = read.table(file.path(pfadu, "vdata.txt"))
# check class (data.frame or not):
class(asp)
## [1] "data.frame"
# the first few lines:
head(coronal)
## Fr Region Vpn Socialclass
## 1 sh R2 S1 W
## 2 s R2 S2 W
## 3 sh R1 S3 W
## 4 s R3 S4 W
## 5 s R2 S5 W
## 6 sh R3 S6 W
# 'ai[m,]' = row m
# 'ai[,m]' = column m
# You can use '$Name' to access column "Name"
#############################################################################
# 1. Numerical und categorical variables
############################################################################
# In a data.frame, columns can consist of numerical or categorical variables.
# In a matrix, you can only have one or the other class of variables.
# Numerical variables: continuous
#
class(asp$d)
## [1] "numeric"
# or
with(asp, class(d))
## [1] "numeric"
# [1] "numeric"
class(int.df$Dauer)
## [1] "integer"
# [1] "integer"
# Categorical variables will be treated as factors (that have two or more levels, or categories; this is different to objects of the class "character"):
class(coronal$Socialclass)
## [1] "factor"
# [1] "factor"
# first 10
coronal$Socialclass[1:10]
## [1] W W W W W W W W W W
## Levels: LM UM W
# asks which levels are given
levels(coronal$Socialclass)
## [1] "LM" "UM" "W"
##########################################################
# 2. Typical example in phonetics
##########################################################
# Is there an influence of x on y?
#
# 1. y = numerical, x = categorical
# 1.1 difference in duration in /i, e, a/ ?
# 1.2 = influence of x (=vowel) on y (=duration)?
# 1.3 possible geoms: geom_boxplot()
# or: geom_histogram() or stat_density()
# 2. y = categorical, x = categorical
# 2.1 words like Sohn, Sonne... can be produced either with /s/ or /z/.
# /s/ more likely in Bavaria or in Hamburg?
# 2.2 possible geom: geom_barchart()
# 3. y = numerical, x = numerical
# 3.1 bigger mouth opening related to a longer duration?
# 3.2 possible geom: geom_point(), geom_line()
geom_boxplot(), geom_histogram(), stat_density()geom_bar()geom_point(), geom_line()geom_point()############################################################################
# 3. geom_boxplot(): y = numerical, x = categorical
############################################################################
head(asp)
## d Wort Vpn Kons Bet
## 1 26.180 Fruehlingswetter k01 t un
## 2 23.063 Gestern k01 t un
## 3 26.812 Montag k01 t un
## 4 14.750 Vater k01 t un
## 5 42.380 Tisch k01 t be
## 6 21.560 Mutter k01 t un
# Influence of place of articulation (Kons) on duration of aspiration (d)?
# y: d (numerical)
# x: Kons (categorical)
# Syntax in ggplot()
# A + B + C + D + ...
# A, B, C... are modules.
# Here:
# A. data-frame + B. Variables + C. kind of plot
ggplot(asp) + aes(y = d, x = Kons) + geom_boxplot()
# or
# A
p1 = ggplot(asp)
# B
p2 = aes(y = d, x = Kons)
# C
p3 = geom_boxplot()
# A + B + C
p1 + p2 + p3
# oder A + B + C ablegen
erg = p1 + p2 + p3
# Bild
erg
# boxplot.
# thick line = median; 'Box': interquartile range
#
############################################################################
# 4. geom_bar(): y ist kategorial, x ist kategorial
############################################################################
head(coronal)
## Fr Region Vpn Socialclass
## 1 sh R2 S1 W
## 2 s R2 S2 W
## 3 sh R1 S3 W
## 4 s R3 S4 W
## 5 s R2 S5 W
## 6 sh R3 S6 W
# Influence of region (Region) in place of articulation (F1)?
# y: Fr (categorical)
# x: Region (categorical)
p1 = ggplot(coronal)
p2 = aes(fill = Fr, x = Region)
# to print frequencies of occurance
p3 = geom_bar()
p1 + p2 + p3
# place bars side by side
p4 = geom_bar(position="dodge")
p1 + p2 + p4
# print proportions
p5 = geom_bar(position="fill")
p1 + p2 + p5
############################################################################
# 5. geom_point(), geom_line(): y ist numerisch, x ist numerisch
############################################################################
# Inwiefern wird die Dauer (Dauer) von der Intensität (dB) beeinflusst in dem Data-Frame int.df()
# y: Dauer (numerisch)
# x: dB (numerisch)
head(int.df)
## Vpn dB Dauer
## 1 S1 24.50 162
## 2 S2 32.54 120
## 3 S2 38.02 223
## 4 S2 28.38 131
## 5 S1 23.47 67
## 6 S2 37.82 169
# Nur Linie
ggplot(int.df) + aes(x = dB, y = Dauer) + geom_line()
# Nur Punkte
ggplot(int.df, aes(x = dB, y = Dauer)) + geom_point()
# Beide
ggplot(int.df, aes(x = dB, y = Dauer)) + geom_line() + geom_point()
############################################################################
# 6. + xlab() + ylab() + ggtitle()
############################################################################
# same boxplot as above
p1 = ggplot(asp) + aes(y = d, x = Kons) + geom_boxplot()
# label for x-axis
p2 = xlab("Place of Articulation")
# label for x-axis
p3 = ylab("Duration (ms)")
# Titel
p4 = ggtitle("Boxplot")
p1 + p2 + p3 + p4
# same barchart as above
bar.p = ggplot(coronal) + aes(x = Region, fill = Fr) + geom_bar(position = "fill")
x.p = xlab("Region")
y.p = ylab("Proportion")
t.p = ggtitle("Proportional Distribution of Fricatives")
bar.p + x.p + y.p + t.p
############################################################################
# 7. Limits on axes +xlim() + ylim()
############################################################################
# same geom_bar() as above
p1 = ggplot(int.df, aes(dB, Dauer)) + geom_point()
# xlim
p2 = xlim(c(10, 60))
# ylim
p3 = ylim(c(30, 280))
p1 + p2 + p3
#reverse axes:
p4 = scale_x_reverse()
p5 = scale_y_reverse()
p1 + p4 + p5
(see http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf)
colors()
## [1] "white" "aliceblue" "antiquewhite"
## [4] "antiquewhite1" "antiquewhite2" "antiquewhite3"
## [7] "antiquewhite4" "aquamarine" "aquamarine1"
## [10] "aquamarine2" "aquamarine3" "aquamarine4"
## [13] "azure" "azure1" "azure2"
## [16] "azure3" "azure4" "beige"
## [19] "bisque" "bisque1" "bisque2"
## [22] "bisque3" "bisque4" "black"
## [25] "blanchedalmond" "blue" "blue1"
## [28] "blue2" "blue3" "blue4"
## [31] "blueviolet" "brown" "brown1"
## [34] "brown2" "brown3" "brown4"
## [37] "burlywood" "burlywood1" "burlywood2"
## [40] "burlywood3" "burlywood4" "cadetblue"
## [43] "cadetblue1" "cadetblue2" "cadetblue3"
## [46] "cadetblue4" "chartreuse" "chartreuse1"
## [49] "chartreuse2" "chartreuse3" "chartreuse4"
## [52] "chocolate" "chocolate1" "chocolate2"
## [55] "chocolate3" "chocolate4" "coral"
## [58] "coral1" "coral2" "coral3"
## [61] "coral4" "cornflowerblue" "cornsilk"
## [64] "cornsilk1" "cornsilk2" "cornsilk3"
## [67] "cornsilk4" "cyan" "cyan1"
## [70] "cyan2" "cyan3" "cyan4"
## [73] "darkblue" "darkcyan" "darkgoldenrod"
## [76] "darkgoldenrod1" "darkgoldenrod2" "darkgoldenrod3"
## [79] "darkgoldenrod4" "darkgray" "darkgreen"
## [82] "darkgrey" "darkkhaki" "darkmagenta"
## [85] "darkolivegreen" "darkolivegreen1" "darkolivegreen2"
## [88] "darkolivegreen3" "darkolivegreen4" "darkorange"
## [91] "darkorange1" "darkorange2" "darkorange3"
## [94] "darkorange4" "darkorchid" "darkorchid1"
## [97] "darkorchid2" "darkorchid3" "darkorchid4"
## [100] "darkred" "darksalmon" "darkseagreen"
## [103] "darkseagreen1" "darkseagreen2" "darkseagreen3"
## [106] "darkseagreen4" "darkslateblue" "darkslategray"
## [109] "darkslategray1" "darkslategray2" "darkslategray3"
## [112] "darkslategray4" "darkslategrey" "darkturquoise"
## [115] "darkviolet" "deeppink" "deeppink1"
## [118] "deeppink2" "deeppink3" "deeppink4"
## [121] "deepskyblue" "deepskyblue1" "deepskyblue2"
## [124] "deepskyblue3" "deepskyblue4" "dimgray"
## [127] "dimgrey" "dodgerblue" "dodgerblue1"
## [130] "dodgerblue2" "dodgerblue3" "dodgerblue4"
## [133] "firebrick" "firebrick1" "firebrick2"
## [136] "firebrick3" "firebrick4" "floralwhite"
## [139] "forestgreen" "gainsboro" "ghostwhite"
## [142] "gold" "gold1" "gold2"
## [145] "gold3" "gold4" "goldenrod"
## [148] "goldenrod1" "goldenrod2" "goldenrod3"
## [151] "goldenrod4" "gray" "gray0"
## [154] "gray1" "gray2" "gray3"
## [157] "gray4" "gray5" "gray6"
## [160] "gray7" "gray8" "gray9"
## [163] "gray10" "gray11" "gray12"
## [166] "gray13" "gray14" "gray15"
## [169] "gray16" "gray17" "gray18"
## [172] "gray19" "gray20" "gray21"
## [175] "gray22" "gray23" "gray24"
## [178] "gray25" "gray26" "gray27"
## [181] "gray28" "gray29" "gray30"
## [184] "gray31" "gray32" "gray33"
## [187] "gray34" "gray35" "gray36"
## [190] "gray37" "gray38" "gray39"
## [193] "gray40" "gray41" "gray42"
## [196] "gray43" "gray44" "gray45"
## [199] "gray46" "gray47" "gray48"
## [202] "gray49" "gray50" "gray51"
## [205] "gray52" "gray53" "gray54"
## [208] "gray55" "gray56" "gray57"
## [211] "gray58" "gray59" "gray60"
## [214] "gray61" "gray62" "gray63"
## [217] "gray64" "gray65" "gray66"
## [220] "gray67" "gray68" "gray69"
## [223] "gray70" "gray71" "gray72"
## [226] "gray73" "gray74" "gray75"
## [229] "gray76" "gray77" "gray78"
## [232] "gray79" "gray80" "gray81"
## [235] "gray82" "gray83" "gray84"
## [238] "gray85" "gray86" "gray87"
## [241] "gray88" "gray89" "gray90"
## [244] "gray91" "gray92" "gray93"
## [247] "gray94" "gray95" "gray96"
## [250] "gray97" "gray98" "gray99"
## [253] "gray100" "green" "green1"
## [256] "green2" "green3" "green4"
## [259] "greenyellow" "grey" "grey0"
## [262] "grey1" "grey2" "grey3"
## [265] "grey4" "grey5" "grey6"
## [268] "grey7" "grey8" "grey9"
## [271] "grey10" "grey11" "grey12"
## [274] "grey13" "grey14" "grey15"
## [277] "grey16" "grey17" "grey18"
## [280] "grey19" "grey20" "grey21"
## [283] "grey22" "grey23" "grey24"
## [286] "grey25" "grey26" "grey27"
## [289] "grey28" "grey29" "grey30"
## [292] "grey31" "grey32" "grey33"
## [295] "grey34" "grey35" "grey36"
## [298] "grey37" "grey38" "grey39"
## [301] "grey40" "grey41" "grey42"
## [304] "grey43" "grey44" "grey45"
## [307] "grey46" "grey47" "grey48"
## [310] "grey49" "grey50" "grey51"
## [313] "grey52" "grey53" "grey54"
## [316] "grey55" "grey56" "grey57"
## [319] "grey58" "grey59" "grey60"
## [322] "grey61" "grey62" "grey63"
## [325] "grey64" "grey65" "grey66"
## [328] "grey67" "grey68" "grey69"
## [331] "grey70" "grey71" "grey72"
## [334] "grey73" "grey74" "grey75"
## [337] "grey76" "grey77" "grey78"
## [340] "grey79" "grey80" "grey81"
## [343] "grey82" "grey83" "grey84"
## [346] "grey85" "grey86" "grey87"
## [349] "grey88" "grey89" "grey90"
## [352] "grey91" "grey92" "grey93"
## [355] "grey94" "grey95" "grey96"
## [358] "grey97" "grey98" "grey99"
## [361] "grey100" "honeydew" "honeydew1"
## [364] "honeydew2" "honeydew3" "honeydew4"
## [367] "hotpink" "hotpink1" "hotpink2"
## [370] "hotpink3" "hotpink4" "indianred"
## [373] "indianred1" "indianred2" "indianred3"
## [376] "indianred4" "ivory" "ivory1"
## [379] "ivory2" "ivory3" "ivory4"
## [382] "khaki" "khaki1" "khaki2"
## [385] "khaki3" "khaki4" "lavender"
## [388] "lavenderblush" "lavenderblush1" "lavenderblush2"
## [391] "lavenderblush3" "lavenderblush4" "lawngreen"
## [394] "lemonchiffon" "lemonchiffon1" "lemonchiffon2"
## [397] "lemonchiffon3" "lemonchiffon4" "lightblue"
## [400] "lightblue1" "lightblue2" "lightblue3"
## [403] "lightblue4" "lightcoral" "lightcyan"
## [406] "lightcyan1" "lightcyan2" "lightcyan3"
## [409] "lightcyan4" "lightgoldenrod" "lightgoldenrod1"
## [412] "lightgoldenrod2" "lightgoldenrod3" "lightgoldenrod4"
## [415] "lightgoldenrodyellow" "lightgray" "lightgreen"
## [418] "lightgrey" "lightpink" "lightpink1"
## [421] "lightpink2" "lightpink3" "lightpink4"
## [424] "lightsalmon" "lightsalmon1" "lightsalmon2"
## [427] "lightsalmon3" "lightsalmon4" "lightseagreen"
## [430] "lightskyblue" "lightskyblue1" "lightskyblue2"
## [433] "lightskyblue3" "lightskyblue4" "lightslateblue"
## [436] "lightslategray" "lightslategrey" "lightsteelblue"
## [439] "lightsteelblue1" "lightsteelblue2" "lightsteelblue3"
## [442] "lightsteelblue4" "lightyellow" "lightyellow1"
## [445] "lightyellow2" "lightyellow3" "lightyellow4"
## [448] "limegreen" "linen" "magenta"
## [451] "magenta1" "magenta2" "magenta3"
## [454] "magenta4" "maroon" "maroon1"
## [457] "maroon2" "maroon3" "maroon4"
## [460] "mediumaquamarine" "mediumblue" "mediumorchid"
## [463] "mediumorchid1" "mediumorchid2" "mediumorchid3"
## [466] "mediumorchid4" "mediumpurple" "mediumpurple1"
## [469] "mediumpurple2" "mediumpurple3" "mediumpurple4"
## [472] "mediumseagreen" "mediumslateblue" "mediumspringgreen"
## [475] "mediumturquoise" "mediumvioletred" "midnightblue"
## [478] "mintcream" "mistyrose" "mistyrose1"
## [481] "mistyrose2" "mistyrose3" "mistyrose4"
## [484] "moccasin" "navajowhite" "navajowhite1"
## [487] "navajowhite2" "navajowhite3" "navajowhite4"
## [490] "navy" "navyblue" "oldlace"
## [493] "olivedrab" "olivedrab1" "olivedrab2"
## [496] "olivedrab3" "olivedrab4" "orange"
## [499] "orange1" "orange2" "orange3"
## [502] "orange4" "orangered" "orangered1"
## [505] "orangered2" "orangered3" "orangered4"
## [508] "orchid" "orchid1" "orchid2"
## [511] "orchid3" "orchid4" "palegoldenrod"
## [514] "palegreen" "palegreen1" "palegreen2"
## [517] "palegreen3" "palegreen4" "paleturquoise"
## [520] "paleturquoise1" "paleturquoise2" "paleturquoise3"
## [523] "paleturquoise4" "palevioletred" "palevioletred1"
## [526] "palevioletred2" "palevioletred3" "palevioletred4"
## [529] "papayawhip" "peachpuff" "peachpuff1"
## [532] "peachpuff2" "peachpuff3" "peachpuff4"
## [535] "peru" "pink" "pink1"
## [538] "pink2" "pink3" "pink4"
## [541] "plum" "plum1" "plum2"
## [544] "plum3" "plum4" "powderblue"
## [547] "purple" "purple1" "purple2"
## [550] "purple3" "purple4" "red"
## [553] "red1" "red2" "red3"
## [556] "red4" "rosybrown" "rosybrown1"
## [559] "rosybrown2" "rosybrown3" "rosybrown4"
## [562] "royalblue" "royalblue1" "royalblue2"
## [565] "royalblue3" "royalblue4" "saddlebrown"
## [568] "salmon" "salmon1" "salmon2"
## [571] "salmon3" "salmon4" "sandybrown"
## [574] "seagreen" "seagreen1" "seagreen2"
## [577] "seagreen3" "seagreen4" "seashell"
## [580] "seashell1" "seashell2" "seashell3"
## [583] "seashell4" "sienna" "sienna1"
## [586] "sienna2" "sienna3" "sienna4"
## [589] "skyblue" "skyblue1" "skyblue2"
## [592] "skyblue3" "skyblue4" "slateblue"
## [595] "slateblue1" "slateblue2" "slateblue3"
## [598] "slateblue4" "slategray" "slategray1"
## [601] "slategray2" "slategray3" "slategray4"
## [604] "slategrey" "snow" "snow1"
## [607] "snow2" "snow3" "snow4"
## [610] "springgreen" "springgreen1" "springgreen2"
## [613] "springgreen3" "springgreen4" "steelblue"
## [616] "steelblue1" "steelblue2" "steelblue3"
## [619] "steelblue4" "tan" "tan1"
## [622] "tan2" "tan3" "tan4"
## [625] "thistle" "thistle1" "thistle2"
## [628] "thistle3" "thistle4" "tomato"
## [631] "tomato1" "tomato2" "tomato3"
## [634] "tomato4" "turquoise" "turquoise1"
## [637] "turquoise2" "turquoise3" "turquoise4"
## [640] "violet" "violetred" "violetred1"
## [643] "violetred2" "violetred3" "violetred4"
## [646] "wheat" "wheat1" "wheat2"
## [649] "wheat3" "wheat4" "whitesmoke"
## [652] "yellow" "yellow1" "yellow2"
## [655] "yellow3" "yellow4" "yellowgreen"
############################ geom_boxplot()
ggplot(asp) + aes(y = d, x = Kons) + geom_boxplot()
# Default colors
# filled with different colors
ggplot(asp) + aes(y = d, x = Kons, fill = Kons) + geom_boxplot()
# different line colors
ggplot(asp) + aes(y = d, x = Kons, col = Kons) + geom_boxplot()
# or chose your own colors
farben = c("green", "red")
# filled
ggplot(asp) + aes(y = d, x = Kons) + geom_boxplot(fill = farben)
# line colors
ggplot(asp) + aes(y = d, x = Kons) + geom_boxplot(col = farben)
############################ geom_bar()
##########
p1 = ggplot(coronal) + aes(x = Region, fill = Fr) + geom_bar()
p1
# Eigene Farben wählen
farben = c("yellow", "green")
p2 = scale_fill_manual(values = farben)
p1 + p2
(see http://www.endmemo.com/program/R/pchsymbols.php)
##########
ggplot(int.df, aes(x = dB, y = Dauer)) + geom_point() + geom_line()
# col: color.
# pch: plotting character.
# cex: character expansion:cex =2 means 2*standard size
ggplot(int.df, aes(x = dB, y = Dauer)) + geom_point(col="purple", pch=0, cex=2) + geom_line(col = "pink")
# lwd: Liniendichte
ggplot(int.df, aes(x = dB, y = Dauer)) + geom_point(col="purple", pch=0, cex=2) + geom_line(col = "pink", lwd=2)
# Default size ist 11 (Legende: 10 (??))
p1 = ggplot(asp) + aes(y = d, x = Kons) + geom_boxplot() + xlab("Artikulationsstelle") + ylab("Dauer (ms)") + ggtitle("Boxplot-Daten")
p1
# size 16
p16 = theme(text = element_text(size=16))
p1 + p16
# change only on axes
q24 = theme(axis.text = element_text(size=24))
p1 + q24
# Different values on axes labels and title
p30 = theme(text = element_text(size=30))
p1 + q24 + p30
#create one boxplot per stress pattern (Bet: levels "be" and "un")
pf = facet_grid(~Bet)
p1 + pf
# or add col to aes():
pc = ggplot(asp) + aes(y = d, x = Kons,col=Bet) + geom_boxplot() + xlab("Artikulationsstelle") + ylab("Dauer (ms)") + ggtitle("Boxplot-Daten")
pc
You can, of course combine facets and colors and therefore plot the influences of up to three independent variables.
# if necessary, install.packages(gridExtra)
library(gridExtra)
p1 = ggplot(asp, aes(y = d, x = Kons)) + geom_boxplot()
p2 = ggplot(coronal) + aes(x = Region, fill = Fr) + geom_bar()
p3 = ggplot(int.df, aes(dB, Dauer)) + geom_line() + geom_point()
grid.arrange(p1, p2, p3, ncol=3, nrow =1)
theme# see
help(theme)
p1 = ggplot(int.df, aes(dB, Dauer)) + geom_point()
int.lm = geom_smooth(method="lm",se=FALSE)
p1 + int.lm
#by default, geom_smooth shows the standard error:
int.lmse = geom_smooth(method="lm")
p1 + int.lmse
# you can calculate this stat (here lm() ) for each facet (e.g. for each subject (Vpn)) separately
p1 + int.lmse + facet_grid(~Vpn)
Instead of geom_smooth(), you could also add lines with geom_abline(intercept=..., slope=... ), and horizontal and vertical lines with geom_hline() and geom_vline.
geom_smooth() can be used with several smoothing methods, like lm, but also glm (for sigmoidal curves fitting binary perceptual data), and some others (it can fit e.g. splines with loess). One example of method glm (in which you have to add the information that it is binomial data) would be:
bat.df = read.table("Rgraphics/dataSets/bat.df.txt")
bat.plot = ggplot(bat.df) + aes(y = p, x = steps) + geom_point(col = "red") + facet_wrap(~participant) + ggtitle("bat")
#add listener-specific sigmoids
bat.plot + geom_smooth(method = "glm",se=FALSE,method.args = list(family=binomial))
In phonetics, we often draw ellipses around two-dimensional data points, representing F2 and F1 values of vowels. We can add an ellipse by stat_ellipse().
ell = stat_ellipse()
p1 + ell
By default, this adds an ellipse representing the 95%-confidence interval (under the assumption of a multivariate t-distribution). While it is not extremely useful with the given data, it is useful in segregating vowel categories. However - be careful: at low numbers of tokens, one or two outliers can produce somehow “silly” ellipses:
td_mid = read.table("Rgraphics/dataSets/td_mid.txt")
p1 = ggplot(td_mid, aes(y = T1, x = T2, col = labels, label=labels))
#add data.points as text labels, defined by their value
p2 = geom_text()
p1 + p2
p3 = stat_ellipse()
p4 = scale_y_reverse()
p5 = scale_x_reverse()
p6 =labs(x = "F2(Hz)", y = "F1(Hz)")
p7 = theme(legend.position="none")
p1 + p2 + p3 + p4 + p5 + p6 + p7
# only ellipses (do NOT plot data.points)
p1 + p3 + p4 + p5 + p6 + p7
#plot the label-specific means of F1 and F2 (here: T1 and T2)
p2_centroid = geom_text(data = aggregate(cbind(T1,T2)~labels,data=td_mid,FUN=mean))
p1 + p2_centroid + p3 + p4 + p5 + p6 + p7
#btw, we could also vary the linetype
p1_alt = ggplot(td_mid, aes(y = T1, x = T2, col = labels, label=labels,linetype=labels))
p1_alt + p2_centroid + p3 + p4 + p5 + p6
It is also very easy to do the replacement of the dplot shown at the beginning of this document.
ggplot(vowels_fm_new) +
aes(x=times_rel,y=T2,col=labels,group=sl_rowIdx) +
geom_line() +
labs(x = "vowel duration (ms)", y = "F2 (Hz)")
However, it is much more difficult to produce the time-normalized and by-vowel averaged version. We will need the function normalizeLength() (that will be available with the next release of emuR). We can, however, use a prepared version of a length-normalized emuRtrackdata object that contains normalized times:
td_norm = read.table("Rgraphics/dataSets/td_norm.txt")
ggplot(aggregate(T2~times_norm+labels, data = td_norm,FUN=mean)) +
aes(x=times_norm,y=T2,col=labels) +
geom_line() +
labs(x = "vowel duration (normalized)", y = "F2 (Hz)")
This chapter gave a very short introduction into the package ggplot2. More information can be found at e.g. http://r-statistics.co/Complete-Ggplot2-Tutorial-Part1-With-R-Code.html, or https://opr.princeton.edu/workshops/Downloads/2015Jan_ggplot2Koffman.pdf, or any other website you may find (there are numerous introductions to ggplot2).