Agent Based Modelling of s-retraction in Australian English -

Initial data and simulation code


by


Mary Stevens, Jonathan Harrington & Florian Schiel

2018-10-08

-----------------------------------------------------------------


Introduction

This package contains the orginal R Language code and initial speech 
data to replicate the experiment as described in [2]. Note that a
byte-by-byte replication is inherently impossible since the experiment
is of a stochastic nature, i.e. for instance the selection of which 
agent talks to which agent about what is determined by a stochastic model.
The code therefore consists of two parts:
1. a single ABM simulation that demonstrates the basic principle and can 
be used to observe online modifications within the agents' memories 
during the interaction (called 'single run' in the code);
2. a loop (called 'multiple runs' in the code) over a large number of 
repetitions (parameter 'multipleABMRuns') of single ABM run, which logs 
all agents' memory contents during the interactions. These logs were 
analysed statistically in [2] to observe general trends in the 
development of agents' memories that were present in the majority 
of the simulated ABM runs.


How to run the experiment

1. Unzip this package on a computer with R Language installed in some location, 
for instance '/home/schiel/expRoot'.

2. Make sure that the following R packages are installed on the computer:
MASS, lattice, latticeExtra, emuR, mvtnorm
(hint: if the R command 'library(MASS)' issues an error message, the package
MASS is not installed on your computer; use the command 'install.packages(MASS)'
to install it.)

3. Edit the file /home/schiel/expRoot/Rcmd/master.R to change the path defined in the 
parameter 'pfad2' to the directory where you unpacked this package.
For instance:

############################################################
# CHANGE THIS IF YOU MOVE THE EXPERIMENT TO ANOTHER COMPUTER
# to the installation dir of the package (where you see 'Rcmd')
pfad2 = "/home/schiel/expRoot"
############################################################

If you like, look through the parameter definitions in the header
part of 'Rcmd/master.R' and possibly change some values.
If you don't change anything the script will ask you whether to
run a single ABM or a multiple ABM under the exact same conditions 
that have been used in the experiment described in [2].

If you choose single ABM, the script will plot figures and animations
while performing a single ABM with 60,000 interactions. This is useful
to learn about the effect of different conditions or just to 
demonstrate the ABM development; it is not useful to test a hypothesis,
because each single ABM run will result in slightly different memory 
contents (because of the stochastic nature of the simulation).

If you choose multiple ABM, the script will run the ABM 100 times 
(this will probably take several hours or days) without any plotting  
and will store all logging information and the animations (which will 
be very large, about 6.5GB) for later analysis in LogDir (see details 
about logging below). Exactly such a multiple run was the empirical
basis for the analysis and tests in [2].

Don't forget to save the changes you made in file 'Rcmd/master.R',
before you continue.
 
4. Start R and source (execute) the main script of the experiment:

> source("/home/schiel/expRoot/Rcmd/master.R")

5. Follow the instructions printed on the command line


Initial memory data of agents

As described in section 2 of [2] the agents' memories are initialized 
from real recordings from 19 speakers in a small town of NSW Australia,
i.e. with their own productions at a certain point in time.
These initial data are in form of a DCT-tripplet per memorized word 
token encoding the speaker- and time-normalized trajectory of the 
spectral weight of the /s/ and /esch/ sibilants. These DCT triplets 
are stored as lines in a table in expRoot/data/str.df.
Columns in this table encode (among other information) the DCT-triplets
(P1,P2,P3), the word label (Word), the agent's number (Speaker) and the 
type of sibilant (Initial: 'str' = /s/ in 'str' context, 's' = /s/ in 
pre-vocalic context, and prevocalic 'S' = /esch/).


Some remarks about the R Code

The code presented in this package is based on the code used in [1] applied
to u-fronting in Southern English. It was extended for [2] in several aspects;
the most prominent features are:

- the original decision rule for memorizing an incoming word token based
on the max a-posteriori probability (experiment 1 in [1]) is also used here.

- instead of removing the most outlying memorized token in the feature space
(when a new token is memorized into a phonological category), this experiment
uses the 'time-delay-based' approach as described in experiment 2 of [1]:
the oldest memorized token stored in the phonological category is
removed.

- agents can split a phonological category into two new categories or merge 
two existing categories into one (referred to as 'split&merge' in the code).

- the concept of 'phonological equivalence class' was introduced; this
was necessary after the implementation of split&merge, because the
labels of phonological categories in the individual agents' memories cannot 
string-match across memories of different agents (see section 4.2 in [2] 
for a detailed explanation); in the R code a fixed color schema is used to 
signal the 7 different possible equivalence classes; the schema is usually 
explained in the plots generated by the R code.

- logging was extended to various figure plots and animation frames. All 
logging information for an ABM run (single or multiple) is stored in a single
directory named by expRoot/LogDir/YYYYMMDDhhmmss. 

All of the features described above (and many more) can be controlled in the 
R code by modifying parameter values in the header of the main R script
'Rcmd/master.R'; for instance, if the ABM simulation should run over only
25,000 interactions instead of (default) 60,000, simply set the parameter
'simGroups' = 25 (the size of each simulation group is defined in 'simGroupSize',
typically 1000). 


References

[1] Harrington, J. & Schiel, F. (2017) /u/-fronting and agent-based modeling: The
relationship between the origin and spread of sound change. Language, 93.2,
414-445. 
[2] Stevens, M., Harrington, J. and F. Schiel (forthcoming) Associating the 
origin and spread of sound change using agent-based modelling applied to 
/s/-retraction in English', GLOSSA.

