-
Notifications
You must be signed in to change notification settings - Fork 13
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #3 from dmlond/master
Actual
- Loading branch information
Showing
14 changed files
with
417 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,4 +5,5 @@ packer_cache/ | |
*sai* | ||
*fasta* | ||
*fastq* | ||
*sam* | ||
.*sam* | ||
*samtools* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
#!/bin/bash | ||
|
||
#make sure this script runs from the ROOT of the project | ||
cd `dirname $0`/.. | ||
|
||
# this is a simple pipeline that maps FASTQ reads to a reference genome (in FASTA format). | ||
|
||
# Here we define where our data reside. Perhaps we may need to modify this depending on | ||
# how we run the pipeline. | ||
DATA=data | ||
|
||
# Here we define the number of cores we will use for the calculations. Perhaps we may need | ||
# to modify this depending on the configuration of our VM | ||
CORES=2 | ||
|
||
# The location of the reference genome in relation to the data folder | ||
REFERENCE=$DATA/Pf3D7_v2.1.5.fasta | ||
|
||
# The location of the reads in relation to the data folder | ||
READS_1=$DATA/ERR022523_1.fastq.gz | ||
READS_2=$DATA/ERR022523_2.fastq.gz | ||
FASTQS="$READS_1 $READS_2" | ||
|
||
# recreate BWA index if not exists | ||
if [ ! -e $REFERENCE.bwt ]; then | ||
echo "going to index $REFERENCE" | ||
|
||
# Warning: "-a bwtsw" does not work for short genomes, | ||
# while "-a is" and "-a div" do not work not for long | ||
# genomes. Please choose "-a" according to the length | ||
# of the genome. | ||
docker-compose run bwa index -a bwtsw $REFERENCE | ||
else | ||
echo "$REFERENCE already indexed" | ||
fi | ||
|
||
# lists of produced files. These will be assigned values as we run the pipeline | ||
SAIS="" | ||
SAM="" | ||
|
||
# iterate over FASTQ files | ||
for FASTQ in $FASTQS; do | ||
|
||
# create new names from the stem of the FASTA and FASTQ files | ||
LOCALFASTA=`echo $REFERENCE | sed -e 's/.*\///'` | ||
LOCALFASTQ=`echo $FASTQ | sed -e 's/.*\///'` | ||
OUTFILE=$DATA/$LOCALFASTQ-$LOCALFASTA.sai | ||
|
||
# grow the list of *.sai files | ||
SAIS="$SAIS $OUTFILE" | ||
|
||
# create a name for the SAM file | ||
SAM=`echo $OUTFILE | sed -e "s/_.*/-$LOCALFASTA.sam/"` | ||
|
||
# note: we don't do basic QC here, because that might mean | ||
# that the mate pairs in the FASTQ files go out of order, | ||
# which will result in the bwa sampe step taking an inordinate | ||
# amount of time | ||
|
||
# do bwa aln if needed | ||
if [ ! -e $OUTFILE ]; then | ||
echo "going to align $FASTQ against $REFERENCE" | ||
|
||
# use $CORES threads | ||
docker-compose run bwa aln -t $CORES $REFERENCE $FASTQ -f $OUTFILE | ||
else | ||
echo "alignment $OUTFILE already created" | ||
fi | ||
done | ||
|
||
# do bwa sampe if needed | ||
if [ ! -e $SAM ]; then | ||
|
||
# create paired-end SAM file | ||
echo "going to run bwa sampe $FASTA $SAIS $FASTQS -f $SAM" | ||
docker-compose run bwa sampe $REFERENCE $SAIS $FASTQS -f $SAM | ||
else | ||
echo "sam file $SAM already created" | ||
fi | ||
|
||
# do samtools filter if needed | ||
if [ ! -e $SAM.filtered ]; then | ||
# -bS = input is SAM, output is BAM | ||
# -F 4 = remove unmapped reads | ||
# -q 50 = remove reads with mapping qual < 50 | ||
echo "going to run samtools view -bS -F 4 -q 50 -o $SAM > $SAM.filtered" | ||
docker-compose run samtools view -bS -F 4 -q 50 -o $SAM.filtered $SAM | ||
docker-compose run gzip -9 $SAM | ||
else | ||
echo "sam file $SAM.filtered already created" | ||
fi | ||
|
||
# do samtools sorting if needed | ||
if [ ! -e $SAM.sorted.bam ]; then | ||
|
||
# sorting is needed for indexing | ||
echo "going to run samtools sort $SAM.filtered $SAM.sorted" | ||
docker-compose run samtools sort $SAM.filtered $SAM.sorted | ||
else | ||
echo "sam file $SAM.sorted already created" | ||
fi | ||
|
||
# created index for BAM file if needed | ||
if [ ! -e $SAM.sorted.bam.bai ]; then | ||
|
||
# this should result in faster processing | ||
echo "going to run samtools index $SAM.sorted.bam" | ||
docker-compose run samtools index $SAM.sorted.bam | ||
else | ||
echo "BAM file index $SAM.sorted.bam.bai already created" | ||
fi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
FROM ubuntu:trusty | ||
MAINTAINER Darin London <[email protected]> | ||
|
||
RUN apt-get update \ | ||
&& apt-get install -y wget \ | ||
&& apt-get install -y bzip2 \ | ||
&& apt-get install -y tar \ | ||
&& apt-get install -y build-essential \ | ||
&& apt-get install -y zlib1g-dev | ||
ADD install_bwa.sh install_bwa.sh | ||
# this downloads the bwa source, makes it, moves it into place, then removes | ||
# the downloads in one transaction to make sure downloads do not remain | ||
# in the image | ||
RUN ./install_bwa.sh | ||
|
||
# this creates a default command that gets | ||
# run when the container is run without arguments | ||
# it will print the usage + version of bwa and exit | ||
CMD bwa |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
#!/bin/bash | ||
|
||
# download and extract bwa source | ||
wget -O bwa-0.7.12.tar.bz2 http://sourceforge.net/projects/bio-bwa/files/bwa-0.7.12.tar.bz2/download | ||
tar jxf bwa-0.7.12.tar.bz2 | ||
# build bwa and move it into /usr/local/bin | ||
cd bwa-0.7.12 | ||
make | ||
mv bwa /usr/local/bin | ||
# clean up to minimize the size of the resulting image | ||
cd .. | ||
rm -rf bwa-0.7.12* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
FROM centos:latest | ||
RUN ["/usr/sbin/useradd", "bwa_user"] | ||
RUN ["/usr/bin/yum", "install", "-y", "wget"] | ||
RUN ["mkdir", "-p", "/home/bwa_user/data"] | ||
RUN ["mkdir","-p","/home/bwa_user/data"] | ||
RUN ["chown","bwa_user","/home/bwa_user/data"] | ||
RUN ["chgrp","bwa_user","/home/bwa_user/data"] | ||
RUN ["chmod","777","/home/bwa_user/data"] | ||
ADD download_plasmodium_raw.sh /usr/local/bin/download_plasmodium_raw.sh | ||
VOLUME ["/home/bwa_user/data"] | ||
WORKDIR /home/bwa_user/data | ||
USER bwa_user | ||
CMD "/usr/local/bin/download_plasmodium_raw.sh" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
#!/bin/bash | ||
|
||
wget -O /home/bwa_user/data/ERR022523_1.fastq.gz ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR022/ERR022523/ERR022523_1.fastq.gz | ||
wget -O /home/bwa_user/data/ERR022523_2.fastq.gz ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR022/ERR022523/ERR022523_2.fastq.gz |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
FROM ubuntu:trusty | ||
MAINTAINER Darin London <[email protected]> | ||
|
||
RUN apt-get update \ | ||
&& apt-get install -y wget \ | ||
&& apt-get install -y bzip2 \ | ||
&& apt-get install -y gzip \ | ||
&& apt-get install -y tar \ | ||
&& apt-get install -y build-essential \ | ||
&& apt-get install -y zlib1g-dev \ | ||
&& apt-get install -y ncurses-dev | ||
ADD install_samtools.sh install_samtools.sh | ||
# this downloads the bwa source, makes it, moves it into place, then removes | ||
# the downloads in one transaction to make sure downloads do not remain | ||
# in the image | ||
RUN ./install_samtools.sh | ||
|
||
# this creates a default command that gets | ||
# run when the container is run without arguments | ||
# it will print the usage + version of samtools and exit | ||
CMD samtools |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
#!/bin/bash | ||
wget -O samtools-1.2.tar.bz2 http://sourceforge.net/projects/samtools/files/samtools/1.2/samtools-1.2.tar.bz2/download | ||
tar jxf samtools-1.2.tar.bz2 | ||
# build samtools and move it into /usr/local/bin | ||
cd samtools-1.2 | ||
make | ||
mv samtools /usr/local/bin | ||
# clean up to minimize the size of the resulting image | ||
cd .. | ||
rm -rf samtools-1.2* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
bwa: | ||
build: conf/docker/bwa | ||
volumes: | ||
- ./:/wdir | ||
working_dir: /wdir | ||
entrypoint: bwa | ||
command: '' | ||
samtools: | ||
build: conf/docker/samtools | ||
volumes: | ||
- ./:/wdir | ||
working_dir: /wdir | ||
entrypoint: samtools | ||
gzip: | ||
build: conf/docker/samtools | ||
volumes: | ||
- ./:/wdir | ||
working_dir: /wdir | ||
entrypoint: gzip |
File renamed without changes.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
![GTPB](http://gtpb.igc.gulbenkian.pt/bicourses/images/GTPB2015logo.png "GTPB") | ||
|
||
Introducing Docker | ||
================== | ||
|
||
[Docker](www.docker.com) has some similarities with Virtualization Technologies: | ||
|
||
- both involve the creation of reuseable images | ||
- both involve running one or more instances of an image on a Host machine | ||
- images can be transported from one Host to another and run successfully | ||
so long as the hosting software is installed | ||
|
||
Docker images differ from Virtualization images in many important ways. | ||
|
||
- They are 5-10 times smaller | ||
- They depend on and use much more of the host linux resources | ||
- They are less secure | ||
- Instances are called Containers | ||
- Containers can be instantiated and run within seconds | ||
- Containers can be plugged in to the Host tty, STDIN, STDOUT, and STDERR | ||
|
||
The primary difference between a Docker image and a VM image is tied to | ||
a philosophical difference. | ||
|
||
VM images are created to host an entire machine architecture which is run as if it were its own machine, completely oblivious to its host. | ||
|
||
Docker images are designed to host a single application and its dependencies. They are designed to run on the host as if natively installed. To compose a pipeline, you use or create docker images for each application required, and run containers from the host more or less hooked in to the host, similar to the way you would run a natively installed application. | ||
|
||
Docker Ecosystem | ||
---------------- | ||
|
||
**Docker Machine** | ||
|
||
Host systems must install and run the Docker daemon. The daemon can only run on a modern (version created within the last 2 years) Linux Kernal. Almost all flavors of Linux (Fedora, Redhat, Ubuntu, Debian) use the Linux Kernal, and can host the daemon on them natively. Some flavors of \*Nix (Mac OSX in particular), do not use the Linux Kernal. They must run the docker daemon inside a VirtualMachine built on one of the Linux flavors with a modern kernal. This can introduce a bit more complexity, but it also introduces the powerful concept of using external docker hosts 'in the cloud'. | ||
|
||
The docker daemon runs a web service in the background and listens to special ports for requests to manage docker images and containers. It provides a REST interface API that can be used by any client. Typically, it uses an encrypted connection called TLS, which is a standard system used by many network client-server communications. TLS requires that each client generate an encrypted certificate (not the same as used by GitHub) to be used when they communicate with the service. The primary client that uses the REST interface is the docker commandline interface. | ||
|
||
The [docker-machine](https://docs.docker.com/machine) command automates the process of getting a docker host running on any computer with a supported Virtualization system (Virtualbox and VMware are supported). It makes it much easier to get Docker up and running if you do not have Systems Administration expertise. It does this by: | ||
- downloading a special VM image for a specified VM management system preconfigured to host and run the docker daemon | ||
- generating TLS certificates | ||
- starting and stopping the VM | ||
- Providing an easy way to configure the Environment needed by the Docker commandline interface (see below) | ||
The docker-machine command can also be used to create docker machines on many cloud [hosting systems](https://docs.docker.com/machine/#using-docker-machine-with-a-cloud-provider), which may be attractive to those wanting to purchase more powerful compute environments than are provided by their own machine, or institution. | ||
|
||
**Docker** | ||
|
||
The [docker commandline interface](https://docs.docker.com/reference/commandline/cli/) is written in the Go programming language. There are versions available for every known operating system (even Windows 10!). It is designed to interface with the Docker Machine daemon over the network using its REST interface. By compartmentalizing the docker interface from the docker machine, it is possible to use the same docker command to interface with a docker machine running anywhere on the network. | ||
|
||
The client must run in the context of a special set of Environment variables: | ||
* DOCKER_TLS_VERIFY (1 if using TLS, default) | ||
* DOCKER_CERT_PATH (path to TLS certificate if using TLS) | ||
* DOCKER_HOST (url and port to the Docker Host daemon service) | ||
|
||
The docker commandline interface provides the full set of tools needed to create and manage docker images and image container instances. | ||
|
||
* pull images from a Docker Registry (it knows about the Official Docker Registry by default) | ||
* push images to a Docker Registry (requires login) | ||
* list images | ||
* build images from a build context (more about this tomorrow) | ||
* remove images | ||
* tag images (acts like an alias) | ||
* run container instances of images | ||
* list containers | ||
* start and stop existing container instances (background only) | ||
* pause/unpause existing containers (foreground and background) | ||
* kill a running container (stop is preferred but kill can be used to stop a runaway container process) | ||
* rm stopped/killed container instances | ||
* inspect container instances (running or stopped) | ||
* Dump the log (STDOUT) from a running container | ||
* save and load a tar file of an image (can be used instead of a registry to move docker images from one machine to another) | ||
* exec a command in a running container (allows you to interact with, and change the state of a running container) | ||
|
||
There are many arguments that you can provide to the [Run](https://docs.docker.com/reference/run/) command: | ||
* container naming (docker provides default names to all containers, sometimes humorous), you can specifically name a container at run time | ||
* interactivity mode (interactive or daemon mode) | ||
* attach the host tty (we will demonstrate this) to an interactive container | ||
* mount local directories to the container file system | ||
* connect one container to another container to make a private network between them | ||
* mount volumes from other, special containers, called volume containers, to the container file system | ||
* set the user, group, working directory to be used inside the container | ||
* set environment variables | ||
* override the default entrypoint or command (more on this tomorrow) | ||
* connect host and container STDIN, STDOUT, and STDERR | ||
* expose container ports to the host | ||
|
||
**Docker Registry** | ||
|
||
Docker has hosted a worldwide [Registry](https://registry.hub.docker.com/) of Docker images. Anyone with docker can share their own images with the world. Images shared on the Docker Registry cannot be private. It is possible to [host your own registry](http://docs.docker.com/registry/deploying/). | ||
|
||
The Docker commandline tool is preconfigured to know about and use the official | ||
Docker Registry. | ||
|
||
- docker pull i will pull the image i down onto your host | ||
- docker run i will pull the image i down if it is not present, and then run a container of i | ||
|
||
Lesson Plan | ||
----------- | ||
|
||
- install docker-machine and docker | ||
- explore the Docker Registry | ||
- run some docker images | ||
- with and without docker pull | ||
- with and without local storage | ||
- with exposed ports | ||
- connected to other container systems/services | ||
- inspect information about containers | ||
- inspect the log from running containers | ||
- remove images | ||
- remove containers (with volumes) | ||
|
||
Resources | ||
--------- | ||
- https://www.docker.com/ | ||
- https://docs.docker.com/machine/ | ||
- https://docs.docker.com/compose/ | ||
- https://docs.docker.com/userguide/ | ||
- https://docs.docker.com/reference/commandline/cli/ | ||
- https://registry.hub.docker.com | ||
- https://registry.hub.docker.com/u/tutum/hello-world/ |
Oops, something went wrong.