Skip to content

Installation

Douglas Slotta edited this page Nov 30, 2018 · 16 revisions

Prerequisites

Prerequisites include:

  • python
  • docker
  • python packages
  • wheel
  • setuptools
  • PyYAML
  • cwlref-runner
  • cwltool

You will need to install the prerequisites if they're not already installed on your system.

e.g.,

The instructions that follow use pip and virtualenv, which are usually included with most python installations, so try:

$ pip --version
$ virtualenv --version

If pip is not installed see https://pip.pypa.io/en/stable/installing/ for installation instructions.

Virtualenv can be easily installed with pip:

$ pip install virtualenv

To create a virtualenv for your installation of CWL and PGAP:

$ virtualenv --python=python3 cwl

Installing CWL

$ source cwl/bin/activate
(cwl) $ pip install -U wheel setuptools
(cwl) $ pip install -U cwltool[deps] PyYAML cwlref-runner

Installing Docker

Detailed instructions are found on the docker website, Docker Install. Please install the latest version of docker, it is usually newer than the one that comes with your distribution. Note that it requires root access to install, and the user who will be running the software will need to be in the docker group. The required docker containers and images will download automatically the first time the pipeline runs. Afterwards, they will be cached and subsequent runs will execute much faster.

Make sure that you're running Docker and that you are part of the group that has docker permissions by running

(cwl) $ docker run hello-world

You should see a message that starts with:


Hello from Docker!
This message shows that your installation appears to be working correctly.

Retrieving the CWL code

The CWL software is available from GitHub at https://github.com/ncbi/pgap. Download the source code package for the latest release, which is located at https://github.com/ncbi/pgap/releases, and extract the code.

(cwl) $ wget -qO- https://github.com/ncbi/pgap/archive/2018-09-18.build3190.tar.gz | tar xvz

Download the Supplemental Data

The supplemental data is stored on S3. It is versioned, and must match the CWL and Docker versions. A handy script to download the matching version is provided in the CWL source tree. This will download and extract the data to the input subdirectory.

(cwl) $ ./scripts/fetch_supplemental_data.sh

Running the pipeline on a test genome

The input.yaml file provides most of the required input parameters for the data in the input subdirectory. The other parameters are specific to the genome being annotated, and must be provided by the user. An example MG37 genome is provided with the CWL source. To execute the example:

(cwl) $ cat input.yaml MG37/input.yaml > mg37_input.yaml
(cwl) $ ./wf_pgap_simple.cwl mg37_input.yaml
Clone this wiki locally