Skip to content

Latest commit

 

History

History
293 lines (195 loc) · 10.6 KB

README.md

File metadata and controls

293 lines (195 loc) · 10.6 KB

Overview of the Docker container system

Sheffield R meetup - 6th March 2018 (Updated for SITraN journal club - 2nd April 2019)

Longer version of materials prepared for CRUK Cambridge available here

Mark Dunning (@DrMarkDunning) Bioinformatics Core Director

Sheffield Bioinformatics Core

web : sbc.shef.ac.uk
twitter: @SheffBioinfCore
email: [email protected]

Basics

https://docs.docker.com/engine/docker-overview

Docker is an open platform for developers to build and ship applications, whether on laptops, servers in a data center, or the cloud.

  • Or, it is a (relatively) painless way for you to install and try out Bioinformatics software.
  • You can think of it as an isolated environment inside your exising operating system where you can install and run software without messing with the main OS
    • potentially a good way for beginners to learn command-line tools?
  • Really useful for testing software
  • Clear benefits for working reproducibly
    • instead of just distributing the code used for a paper, you can effectively share the computer you did the analysis on
  • For those of you that have used Virtual Machines, it is a similar concept
  • However, they are more lightweight and easier to distribute
  • Images are combined in a layered system

Installing Docker

Mac

Windows

(may require some messing around with virtualisation or Hyper-V)

Once you have installed Docker using the insructions above, you can open a terminal (Mac) or command prompt (Windows) and run the following to download an image for the Ubuntu operating system from Dockerhub;

docker pull ubuntu

N.B. on Linux, you may need to run docker with sudo, unless you apply this fix

To run a command inside this new environment software we can do;

docker run ubuntu echo "Hello World"

🎉🎉

  • run the docker container for the Ubuntu operating system
  • run the echo command within this operating system
  • exit

To use the container in interactive mode we have to specify a -it argument. Which basically means that it doesn't exit straight away, but instead runs the bash command to get a terminal prompt

docker run -it --rm ubuntu
  • the --rm means that the container is deleted on exit, otherwise your disk could get clogged up with lots of exited containers

  • if no command is specified, you get a shell prompt

  • the ubuntu image (or centos) is often used as a base image upon which other more complicated images are based

  • when you want another image, you only have to download the changes that have been made + i.e. don't need to download ubuntu again

  • more compact images, easier to distribute

  • compare to virtual machine

Volumes in Docker

You'll notice that when you launch a container, you don't automatically have access to the files on your OS. In Docker, we can mount volumes using the -v argument to make files accessible e.g. -v /PATH/TO/YOUR/data:/data inside the container.

## should say that no file or directory exists
docker run --rm ubuntu ls /data

## If on Windows, need correct path separator
docker run --rm -v c:\work:/data ubuntu ls /data

## On Unix it would be something more sensible, like

docker run --rm -v c/home/USER/work:/data ubuntu ls /data

Running R (and RStudio) through Docker

The latest version of R and R devel are provided by the rocker project https://github.com/rocker-org/rocker

docker run --rm -it r-base R
  • pull the latest r-base image, if you don't have it
  • run interatively (-it)
  • run the r-base docker image
  • run the R executable

For latest developmental version of R:-

docker run --rm -it r-devel R

Can also get previous versions of R

  • good if you need to re-run code that was written on a previous R version
  • good if you need to test code on latest version of R

RStudio is also supported. See https://github.com/rocker-org/rocker/wiki/Using-the-RStudio-image

  • this time we do something slightly different
docker run -p 8787:8787 rocker/rstudio
  • the -p argument opens a port in the docker container
  • open a web browser and enter the address http://localhost:8787
  • username rstudio password rstudio
  • you have a version of RStudio working in your web browser!

You can install whatever R packages you need in this container and analyse your data

N.B. Python fans needn't feel left out; there are docker containers for jupyter too.

  • you can mount a volume with -v.
    • or there is an Upload option in the file viewer
  • data (and scripts etc) can be exported with the Export menu item

Once a docker container has quit, you can jump back in with docker start and docker attach

docker ps ##not name of container that just quit
docker start <name-of-container-that-just-exited>
docker attach <name-of-container-that-just-exited>

You can then build a new image

docker commit <name-of-container-that-just-exited> <new image>

There may already be a docker container for popular sets of tools

docker run --rm -p 8787:8787 -e PASSWORD=PASSWORD bioconductor/release_core2

The Dockerfile

The creation of Docker images is specified by a Dockerfile. This is a text file containing the sequence of instructions required to re-create your image from some starting point, which could be the standard Ubuntu image. Essentially we list the commands we use, step-by-step to install all the software required. If you already have a shell script to install your software, then translating to a Dockerfile is relatively painless.

A useful reference is the official Docker documentation on Dockerfiles, which goes into far more detail than we will here.

The example below shows the Dockerfile used to create a Ubuntu image use git to clone a repository and install some packages

FROM ubuntu
MAINTAINER YOU NAME<[email protected]>
RUN apt-get update
RUN apt-get install -y wget build-essential git
RUN git clone.....
RUN R -e 'install.packages(....)'

The docker build command will build a new image from a Dockerfile. With docker push you can distribute this on dockerhub once you have a user name.

docker build -t=my_username/my_new_image .
docker push

Use Case 1:- Distributing software for a training course

Several headaches can emerge when preparing the materials for a training course

  • if the course venue has no desktop machines, participants will need to bring their own machines
    • so they will need to install software beforehand
    • challenging for beginners
  • docker presents a potential solution
    • (however, they will still need to install docker - which could still be a barrier for some)
  • distributing materials to other participants who can't make the class, or who want to attend remotely
  • when developing materials in a team, need to agree on common software versions etc
  • could pre-install the container on an AWS instance
docker pull markdunning/rnaseq-toolkit
docker run --rm -p 6080:80 markdunning/rnaseq-toolkit

Use Case 2:- Distributing supplementary data for a publication

  • Stephen Eglen of Department of Applied Mathematics and Theoretical Physics, University of Cambridge made the data and code for his paper available on github 👍
  • furthermore, the scripts, data are available with the appropriate version of R as a docker container 👍👍
docker run -d -p 8787:8787 sje30/waverepo

Use case 3:- Reproducible analysis

Issue: doing several analyses at same time, some of which may require latest version of R etc. How can we ensure that previous analyses still run. Within each project, create a Dockerfile to build a container for the analysis.

FROM bioconductor/release_base2:R3.5.3_Bioc3.8
MAINTAINER Mark Dunning<[email protected]>
RUN R -e 'install.packages("BiocManager")'
RUN R -e 'BiocManager::install("tidyverse")'
RUN R -e 'BiocManager::install("DEXSeq")'

Useful docker commands

To see what containers you have run recently

docker ps -a

If you find your disk filling up with docker images, there are convenient one-liners for removing all containers and images.

Don’t run this now, unless you want everything you’ve been working on to be deleted!

docker rm $(docker ps -a -q)
docker rmi $(docker images -q)

You can go back into the environment of a container that has been exited. Firstly, we make sure the container is running by using docker start:-

docker start <container_ID>

We can then use docker attach. Note that you will have to press ENTER twice in order to get a new command prompt within the container.

docker attach <container_ID>

The elephant in the room...

Sounds great so far! But...

  • when you run a Docker container you have super-user access rights inside that container. Unix admin people that manage HPC systems don't like this.

There is an alternative....

Singularity

You can build a singularity image from a docker container

  • Use the docker image on dockerhub at markdunning/dexseq-analysis (build from Dockerfile above) to build an image. You need sudo access to do this.
  • Copy the image to sharc
  • Run the exec command to Run R with an analysis script
### Run where you have sudo access
sudo singularity build singularity/dexseq docker://markdunning/dexseq-analysis
## Copy the image to sharc /shared/bioinformatics_core1/Shared/software/singularity/dexseq
## On sharc
singularity exec /shared/bioinformatics_core1/Shared/software/singularity/dexseq R -f dexseq_analysis.R

More documentation is available for using singularity on sharc