Research compendium package for d’Alpoim Guedes and Bocinsky (2018)

When using the code included in this research compendium, please cite all of the following:

d’Alpoim Guedes, Jade and R. Kyle Bocinsky. Climate change stimulated agricultural innovation and exchange across Asia, 2018. Science Advances 4:eaar4491. http://doi.org/10.1126/sciadv.aar4491

d’Alpoim Guedes, Jade and R. Kyle Bocinsky. Research compendium for: Climate change stimulated agricultural innovation and exchange across Asia, 2018. Version 1.0.0. Zenodo. https://doi.org/10.5281/zenodo.1239106

d’Alpoim Guedes, Jade and R. Kyle Bocinsky. Data output for: Climate change stimulated agricultural innovation and exchange across Asia, 2018. Version 1.0.0. Zenodo. http://doi.org/10.5281/zenodo.788601

Compendium DOI:

The files at the URL above will generate the results as found in the publication. The files hosted at https://github.com/bocinsky/guedesbocinsky2018 are the development versions and may have changed since this compendium was released.

Authors of this repository:

R. Kyle Bocinsky (bocinsky@gmail.com)
Jade d’Alpoim Guedes (jadeguedes@gmail.com)

Overview of contents

This repository is a research compendium package for d’Alpoim Guedes and Bocinsky (2018). The compendium contains all code associated with the analyses described and presented in the publication, as well as a Docker environment (described in the Dockerfile) for running the code.

This compendium is an R package, meaning that by installing it you are also installing most required dependencies. See below for hints on installing some of the command-line tools necessary in this analysis on macOS and Linux. This compendium takes a lot of its cues from Ben Marwick’s rrtools package for performing reproducible research.

The analyses presented in Guedes and Bocinsky (2018) are performed in an RMarkdown vignette (guedesbocinsky2018.Rmd) located in the vignettes directory.

Downloading the compendium package

This compendium package may be downloaded as Data file S3 from d’Alpoim Guedes and Bocinsky (2018), directly from Github as an archive, or cloned with git.

Downloading directly from Github

You can download the compendium package using the following link:

https://github.com/bocinsky/guedesbocinsky2018/archive/1.0.0.zip

Cloning via `git`

To download this research compendium as you see it on GitHub, for offline browsing, install git on your computer and use this line at a Bash prompt (“Terminal” on macOS and Unix-alikes, “Command Prompt” on Windows):

# Clone into the repository
git clone https://github.com/bocinsky/guedesbocinsky2018.git

# Change directories into the local repository
cd guedesbocinsky2018

# Checkout the publication tag
git checkout tags/1.0.0

System requirements

Among the system dependencies for this package are GDAL, FFMPEG, and Ghostscript, V8 v3.15, and protobuf. These packages (and their respective dependencies) must be installed in order to run the analyses. Additionally, Cairo must be among the capabilities of your particular R installation (as it probably is if you installed from a pre-compiled binary download available on CRAN), and a recent versions of Pandoc is required for building the README.md file.

macOS

We strongly suggest using Homebrew to install the system dependencies. Homebrew commands might be:

brew install gdal --with-complete --with-unsupported
brew install ffmpeg
brew install ghostscript
brew install protobuf
brew install v8@3.15
brew install pandoc
brew install pandoc-citeproc

Linux

Please refer to the dockerfiles for rocker/geospatial and bocinsky/bocin_base.

Windows

This software has not been tested on Windows. We recommend using the Docker build to run this on Windows (see below).

R “vector memory exhausted” error

Some installations of R—particularly R >= 3.5.0 running on macOS—will throw a “vector memory exhausted” error when running the analysis. This occurs when R allocates larger vectors than allowed by default; see the R NEWS file for 3.5.0 for details. If you get this error, increasing the R_MAX_VSIZE environment variable might solve the issue. Run these lines in the terminal:

cd ~
touch .Renviron
open .Renviron

Then, add this to the first line of .Renviron:

R_MAX_VSIZE=100Gb

Authentication for the Google Elevation API and tDAR

This analyses requires the user to have the Google Elevation API key and a tDAR user name and password either as environment variables or passed to the guedesbocinsky2018.Rmd RMarkdown vignette as parameters. Please see the Running the analysis sections below for guidance on setting these parameters.

Archaeological site data from tDAR

Archaeological site location data are sensitive information due to the possibility of looting, and archaeological ethics require that we restrict access to those data. Accordingly, an essential component of this analysis is not shipped in this open GitHub repository or archived with Zenodo. We have instead archived the site location data necessary to run this analysis with the Digital Archaeological Record (tDAR) under restricted access. Users who want to run this analysis need to request access through tDAR, which we will provide to any researcher with a reasonable affiliation (academic or otherwise). The main purpose is to track to whom we provide access.

The data are available through tDAR at the following DOI: 10.6067/XCV8MK6G05. Please go to the site and select “Request Access, Submit Correction, Comment” under the downloads section in the right panel. You will have to create a tDAR user account and agree to the tDAR user agreement.

This analysis uses the tDAR application programming interface (API) to authenticate a user into tDAR and access and download the archaeological site data. This requires a user to be part of the tDAR API Group. Please contact Digital Antiquity staff (info@digitalantiquity.org) to be added to this group.

Running the analysis

There are three ways to run the analysis:

Running from within R — Do this if your goal is to explore how we developed the model, or to change parameters.
Running from the terminal — Do this if your goal is just to reproduce the output on your local machine/environment.
Running from a Docker container — Do this if your goal is to reproduce our results precisely, using a custom-build and pre-tested environment.

A note on run time

This analysis has been designed to take advantage of modern multi-core or multi-CPU computer architectures. By default, it will run on two cores—i.e., sections of the code will run in parallel approximately twice as fast as on a single core. The analysis also consumes quite a bit of memory. On two (relatively high-speed) cores, run-time of the entire analysis is approximately 12 hours. This can be sortened dramatically by running with a higher number of cores/processors and amount of memory, if available.

Running from within R

This is what most users will want want to run if your goal is to explore how we developed the model, or to change parameters. Be sure that you have a working version of R installed (>= 3.4.4) and the RStudio development environment.

Download the compendium package
Un-zip the archive and navigate into the guedesbocinsky2018-1.0.0 directory.
Launch the guedesbocinsky2018.Rproj file (should open up RStudio).
Install the package with:

## Install the devtools package, if not previously installed
# install.packages("devtools")
devtools::install_cran("remotes", upgrade_dependencies = FALSE)
devtools::install(".", dependencies = TRUE, upgrade_dependencies = FALSE)
remotes::install_local(".")

Go to the vignettes/ directory.
Open guedesbocinsky2018.Rmd.
Set environment variables (in header at the top of the document). You should replace the sections that start with !r (through the end of the line) with your Google Maps Elevation API key, tDAR user name, and tDAR password (each in single quotes). It should look something like this before replacement: After replacement:
Press “Knit” at the top of the screen to run the analysis.

Running from the terminal

This is what you want to run to reproduce our results from the terminal. We strongly encourage you to run the analysis from R and RStudio if your goal is to explore how we developed the model, or to change parameters.

To run this analysis from the terminal, first you must ensure you have downloaded the compendium package and installed all system requirements. We’ve included a convenient script for running the entire analysis, including installing the compendium package.

First, set your environment variables in the terminal. On Unix-alike systems (including Linux and macOS), you can set environmental variables in the terminal like so:

export google_maps_elevation_api_key=YOUR_API_KEY
export tdar_un=YOUR_TDAR_USER_NAME
export tdar_pw=YOUR_TDAR_PASSWORD

Then, from within the guedesbocinsky2018 directory in the terminal:

bash inst/guedesbocinsky2018_BASH.sh

Output will appear in the vignettes/ directory.

Running from a Docker container

This is what you want to run to reproduce our results precisely. We strongly encourage you to run the analysis from R and RStudio if your goal is to explore how we developed the model, or to change parameters.

Docker is a virtual computing environment that facilitates reproducible research—it allows for research results to be produced independent of the machine on which they are computed. Docker users describe computing environments in a text format called a “Dockerfile”, which when read by the Docker software builds a virtual machine, or “container”. Other users can then load the container on their own computers. Users can upload container images to Docker Hub, and the image for this research (without the analyses run) is available at https://hub.docker.com/r/bocinsky/guedesbocinsky2018/.

We have included a Dockerfile which builds a Docker container for running the analyses described in the paper. It uses rocker/geospatial:3.4.4, which provides R, RStudio Server, the tidyverse of R packages as its base image and adds several geospatial software packages (GDAL, GEOS, and proj.4. The Dockerimage (1) adds ffmpeg, (2) updates the R packages, and (3) installs the R software packages required by this package.

Downloading and running the Docker container image

The commands below demonstrate three ways to run the docker container. See this Docker cheat sheet for other arguments. Using the “:1.0.0” tag will ensure you are running the version of the code that generates the d’Alpoim Guedes and Bocinsky (2018) results—the first time you run the Docker image, it will download it from the Docker Hub.

Setting your environment variables

Set your environment variables in the terminal. On Unix-alike systems (including Linux and macOS), you can set environmental variables in the terminal like so:

export google_maps_elevation_api_key=YOUR_API_KEY
export tdar_un=YOUR_TDAR_USER_NAME
export tdar_pw=YOUR_TDAR_PASSWORD

Run the analysis directly

To run the analyses directly, render the guedesbocinsky2018.Rmd RMarkdown vignette at the end of the run command like so (in the terminal):

docker exec bocinsky/guedesbocinsky2018:1.0.0 r -e "rmarkdown::render('/guedesbocinsky2018/vignettes/guedesbocinsky2018.Rmd', \
                                                                              params = list(cores = 1, \
                                                                              clean = FALSE, \
                                                                              google_maps_elevation_api_key = '$google_maps_elevation_api_key', \
                                                                              tdar_un = '$tdar_un',\
                                                                              tdar_pw = '$tdar_pw'))"

Run the analysis interactively from the terminal

Alternatively, you can run the container in interactive mode and load the script yourself like so (in the terminal):

docker exec -it bocinsky/guedesbocinsky2018:1.0.0 bash

You can use the exit command to stop the container.

Run the analysis from within a Dockerized RStudio IDE

Finally, you can host RStudio Server locally to use the RStudio browser-based IDE. Run like so (in the terminal):

docker exec -p 8787:8787 bocinsky/guedesbocinsky2018:1.0.0

Then, open a browser (we find Chrome works best) and navigate to “localhost:8787” or or run docker-machine ip default in the shell to find the correct IP address, and log in with rstudio/rstudio as the user name and password. In the explorer (lower right pane in RStudio), navigate to the guedesbocinsky2018 directory, and click the guedesbocinsky2018.Rproj to open the project.

Building the Docker container from scratch

If you wish to build the Docker container locally for this project from scratch, simply cd into this guedesbocinsky2018/ directory and run like so (in the terminal):

docker build -t bocinsky/guedesbocinsky2018 .

The -t argument gives the resulting container image a name. You can then run the container as described above, except without the tag.

Run in Docker using the convenience script

We have also included a bash script that builds the Docker container, executes the analysis, and moves the results onto your local machine. To use it, open the terminal, make sure you are in the guedesbocinsky2018/ directory, then run the following:

First, set your environment variables in the terminal. On Unix-alike systems (including Linux and macOS), you can set environmental variables in the terminal like so:

export google_maps_elevation_api_key=YOUR_API_KEY
export tdar_un=YOUR_TDAR_USER_NAME
export tdar_pw=YOUR_TDAR_PASSWORD

Then, change into the guedesbocinsky2018/ directory, and run the convenience script:

bash inst/guedesbocinsky2018_DOCKER.sh

The entire analysis will appear in a docker_out/ directory when the analysis finishes.

Output

The GitHub repository for this project does not contain the output generated by the script—3.2 GB of compressed data. All output data is available as a separate Zenodo archive at:

The vignettes/ directory contains all data generated by the guedesbocinsky2018.Rmd RMarkdown vignette:

data/raw_data contains data downloaded from web sources for this analysis
data/derived_data/ contains tables of the raw site chronometric data without locational information, and the modeled chronometric probability and niche information for each site.
data/derived_data/models/ contains R data objects describing the Kriging interpolation models across the study area
data/derived_data/recons/ contains NetCDF format raster bricks of the model output (i.e., the reconstructed crop niches)
figures/ contains all figures output by the script, including videos of how each crop niche changes over time
figures/site_densities/ contains figures of the estimated chronometric probability density for each site in our database
submission/ contains all of the figures, tables, movies, and supplemental datasets included with d’Alpoim Guedes and Bocinsky (2018)

Licenses

Code: MIT year: 2018
Copyright holders: R. Kyle Bocinsky and Jade d’Alpoim Guedes

Contact

R. Kyle Bocinsky, PhD, RPA
Montana Climate Office, University of Montana
Division of Earth and Ecosystem Sciences, Desert Research Institute
The Research Institute at Crow Canyon Archaeological Center
770.362.6659 – Mobile
bocinsky@gmail.com – Email
bocinsky.io – Web

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Research compendium package for d’Alpoim Guedes and Bocinsky (2018)

Compendium DOI:

Authors of this repository:

Overview of contents

Downloading the compendium package

Downloading directly from Github

Cloning via `git`

System requirements

macOS

Linux

Windows

R “vector memory exhausted” error

Authentication for the Google Elevation API and tDAR

Archaeological site data from tDAR

Running the analysis

A note on run time

Running from within R

Running from the terminal

Running from a Docker container

Downloading and running the Docker container image

Setting your environment variables

Run the analysis directly

Run the analysis interactively from the terminal

Run the analysis from within a Dockerized RStudio IDE

Building the Docker container from scratch

Run in Docker using the convenience script

Output

Licenses

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

Research compendium package for d’Alpoim Guedes and Bocinsky (2018)

Compendium DOI:

Authors of this repository:

Overview of contents

Downloading the compendium package

Downloading directly from Github

Cloning via git

System requirements

macOS

Linux

Windows

R “vector memory exhausted” error

Authentication for the Google Elevation API and tDAR

Archaeological site data from tDAR

Running the analysis

A note on run time

Running from within R

Running from the terminal

Running from a Docker container

Downloading and running the Docker container image

Setting your environment variables

Run the analysis directly

Run the analysis interactively from the terminal

Run the analysis from within a Dockerized RStudio IDE

Building the Docker container from scratch

Run in Docker using the convenience script

Output

Licenses

Contact

Cloning via `git`