-
Notifications
You must be signed in to change notification settings - Fork 94
Quick Start
This guide is intended to quickly get you up and running with PGAP. If you have any questions please read the FAQs, watch this webinar or look over the rest of the documentation. If all else fails submit questions to the Issues queue.
To run the PGAP pipeline you will need:
- Python (version 3.5 or higher),
- the ability to run Docker (see https://docs.docker.com/install/ if it is not already installed),
- about 100GB of storage for the supplemental data and working space,
- and 2GB-4GB of memory available per CPU used by your container.
- Debian 10 is temporarily not supported
Download the file using either
$ curl -OL https://github.com/ncbi/pgap/raw/prod/scripts/pgap.py
or
$ wget https://github.com/ncbi/pgap/raw/prod/scripts/pgap.py
depending upon which utility your system has installed. If one does not work, try the other.
Then try running the pipeline on the Mycoplasma genitalium genome provided with the installation:
$ chmod +x pgap.py
$ ./pgap.py --update # required files are downloaded and extracted
$ ./pgap.py -r -o mg37_results test_genomes/MG37/input.yaml # watch the progress reports and wait for some time.]
Output will be located in the mg37_results
subdirectory as specified by the -o
flag. If that directory exists, then a new directory will be created, with a version number appended.
To run this pipeline using your own genomes, you will need three input files, all in the same directory. Instructions for preparing your data are in the Input Files section.
- A fasta file.
- A YAML file containing metadata (usually called
submol.yaml
). - A YAML file that describes the pipeline inputs, including the above two files.
To get a complete list of options, use the -h
flag. However, here are some notable options.
Command | Description |
---|---|
-r, --report-usage-true |
Report to NCBI whenever the pipeline is run. |
-n, --report-usage-false |
Do not report to NCBI. |
-o path, --output path |
Output directory to be created, which may include a full path. |
--ignore-all-errors |
Ignore errors from quality control analysis, in order to obtain a draft annotation. |