Skip to content

Installation

Douglas Slotta edited this page Oct 1, 2018 · 16 revisions

Prerequisites

Prerequisites include:

  • python
  • docker
  • python packages
  • wheel
  • setuptools
  • PyYAML
  • cwlref-runner
  • cwltool

You will need to install the prerequisites if they're not already installed on your system.

e.g.,

The instructions that follow use pip and virtualenv, which are usually included with most python installs, so try:

$ pip --version
$ virtualenv --version

If pip is not installed see https://pip.pypa.io/en/stable/installing/ for installation instructions.

Virtualenv can be easily installed with pip:

$ pip install virtualenv

To create a virtualenv for your installation of CWL and PGAP:

$ virtualenv --python=python3 cwl

Installing CWL

$ source cwl/bin/activate
(cwl) $ pip install -U wheel setuptools
(cwl) $ pip install -U cwltool[deps] PyYAML cwlref-runner

Installing Docker

Detailed instructions may be found on the docker website, Docker Install. Please install the latest version of docker, it is usually newer than the one that comes with your distribution. Note that it requires root access to install, and the user who will be running the software will need to be in the docker group. The required docker containers images will download automatically the first time the pipeline runs. Afterwards, they will be cached and subsequent runs will execute much faster.

Make sure that you're running Docker and that you are part of the group that has docker permissions by running

(cwl) $ docker run hello-world

You should see a message that starts with:


Hello from Docker!
This message shows that your installation appears to be working correctly.

Retrieving the CWL code

The CWL software is available at GitHub at https://github.com/ncbi/pgap. Download source code package for the latest release, which is located at https://github.com/ncbi/pgap/releases, and extract the code.

(cwl) $ wget -qO- https://github.com/ncbi/pgap/archive/2018-09-18.build3030-beta.tar.gz | tar xvz

Download the Supplemental Data

The supplemental data is stored on S3. It is versioned, and must match the CWL and Docker versions. A handy script to download the matching version is provided in the CWL source tree. This will download and extract the data to the input subdirectory.

(cwl) $ ./scripts/fetch_supplemental_data.sh

Running the pipeline on a test genome

The input.yaml file provides most of the required input parameters for the data in the input subdirectory. The other parameters are specific to the genome being annotated, and must be provided by the user. An example MG37 genome is provided with the CWL source, which may be run thusly.

(cwl) $ cat input.yaml MG37/input.yaml > mg37_input.yaml
(cwl) $ ./wf_pgap_simple.cwl mg37_input.yaml
Clone this wiki locally