Skip to content

Latest commit

 

History

History
82 lines (58 loc) · 4.31 KB

README.md

File metadata and controls

82 lines (58 loc) · 4.31 KB

Computational environment setup for QUIPP-pipeline

To provide a consistent computational environment for this project, we use the repo2docker tool to construct a Docker image that contains the programming languages and libraries specified by the configuration files in this folder.

repo2docker is the tool that is used under-the-hood by BinderHub to create Docker images of a repository that a user wishes to launch on a Binder service such as mybinder.org. We use the same tool here to produce a Docker image that is easily compatible with BinderHub.

The resulting Docker images are available from Docker Hub: turinginst/quipp-env.

Running the QUIPP pipeline using a container

TODO: Similar to Running the pipeline header on main README

Configuration files

There are four configuration files present in this directory. Together, they contain the information needed for repo2docker to create a Docker image that contains all the dependencies specified in the files.

runtime.txt This file contains details of the environment that we want to set up. The file contains the line r-2020-03-01 to indicate that we would like the R language to be installed, and that packages should be downloaded from the snapshot of CRAN that was collected by MRAN on 1st March 2020. This means that we should obtain consistent versions of the R packages across new builds of the Docker image.

install.R This file lists the various R packages required by the methods used in the QUIPP-pipeline repository.

requirements.txt Python dependencies are specified in the requirements.txt file. Python itself will be automatically installed once repo2docker detects the presence of this file.

postBuild This script contains commands that are run after the repository has been built. We use postBuild to remove the files that are copied into the image, to yield a "blank slate" to which code can be added later.

See #22 for a more detailed discussion of why this particular set of configuration files was used.

Updating the Docker image

The Docker image will be updated periodically as we use additional or newer versions of libraries in the QUIPP-pipeline project. For now, this will be done manually; see #?? for notes on the move to continuous integration.

To add new R or Python libraries, the name of the library should be added to the install.R or requirements.txt file as appropriate.

For other languages, see the guides in the BinderHub documentation (generally, this will involve using postBuild).

To create the Docker image, we must first set up repo2docker. Here, we'll install repo2docker into a Python virtual environment.

cd <path to QUIPP-pipeline>
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements-dev.txt

With that set up, it's time to create the Docker image. We call repo2docker with the following arguments, detailed below:

$ jupyter-repo2docker --subdir env-configuration --image-name quipp-env-0.0.1 --user-name jovyan --user-id 1000 --env SGFROOT=/home/jovyan/sgf/bin .

--subdir We build the image based on the contents of this directory.

--image-name The name of the Docker image generated by repo2docker.

--user-name The user name of the primary user of the image. We follow BinderHub and use the name "jovyan".

--user-id We also follow BinderHub for the user ID of 1000.

Running this command will also launch a Jupyter server, to which you can connect via the URL given in the terminal output. This is a good time to check that everything has been configured correctly, but if you would rather not start the server, use the flag --no-run.

Once the build has completed, we can get its ID from the output of the following command...

$ docker image ls

...and can then push the image to Docker Hub using the following:

$ docker login docker.io
$ docker tag 341baa92e34a turinginst/quipp-env:0.0.1
$ docker push turinginst/quipp-env:0.0.1