Data handling, analysis, and simulation files for working with interruption networks.
This code implements interruption networks as described in Maclaren et al. (2021), turn-based networks as described in Sauer & Kauffeld (2013), and networks of emergent leader votes (also called leadership networks, see Carter et al. (2015)). This code was developed to work with a particular data set which can be downloaded from Binghamton University's Open Repository @ Binghamton.
These files expect to be together in a working directory with a ./data/
subdirectory containing the files and directories from the ORB repository. InterruptionAnalysis.py
is a module that contains functions used throughout this project (e.g., import InterruptionAnalysis as ia
).
With the above files and directory structure, run collect-ts-data.py
to produce ./data/timeseries.csv
, which is the base data structure for building and analyzing interruption networks. A row contains the group ID, individual id (pID
), and the start and stop time of the speaking event, its duration, and "latency" (how long since the last speaking event for that pID
).
Once you have the diarizations stored in timeseries.csv
, you can run generate-networks.py
to create each type of network referred to in Maclaren et al. (2021) for each group whose data is in the ORB repository. Other files conduct specific analyses or create specific figures (or parts ot them).
- The simulations in
edge-direction-sim.py
are very slow, and load even slower inedge-direction-plot.py
. Any suggestions for speeding these steps up would be greatly appreciated. - NetworkX and igraph appear to disagree on how missing values should be represented in the GML format. There is one missing value for age.
generate-networks.py
is set to save a string value of "NA" inXSP.gml
. This will work for NetworkX and used to work for igraph. If it doesn't, you can editXSP.gml
directly to readNAN
instead of"NA"
, or change which lines are commented ingenerate-networks.py
. I have an open question on Stack Overflow if you know how to fix this.
The original diarizations were done using ELAN.
Analysis and simulations are done in Python 3, relying on the following packages: numpy
, pandas
, matplotlib
, statsmodels
, and networkx
. ERGM analysis uses R 4.0.3 and the statnet
package families, as well as igraph
and intergraph
. Stata analysis conducted in Stata 17.