-
Notifications
You must be signed in to change notification settings - Fork 92
Home
The NCBI Prokaryotic Genome Annotation Pipeline (PGAP) is designed to annotate bacterial and archaeal genomes (chromosomes and plasmids).
Genome annotation is a multi-level process that includes prediction of protein-coding genes, as well as other functional genome units such as structural RNAs, tRNAs, small RNAs, pseudogenes, control regions, direct and inverted repeats, insertion sequences, transposons and other mobile elements.
NCBI has developed an automatic prokaryotic genome annotation pipeline that combines ab initio gene prediction algorithms with homology based methods. PGAP was originally developed in 2001 and is regularly upgraded to improve structural and functional annotation (see Haft DH et al 2018 and Tatusova et al 2016). Recent improvements include the use of curated protein profile hidden Markov models (HMMs), and curated complex domain architectures for the functional annotation of proteins.
This repository provides a stand-alone version of PGAP. It can run on your machine, compute farm, or the cloud, and be used to annotate any public or privately-owned genome. The pipeline is written in the Common Workflow Language and is packaged with the necessary binaries and cwltool, the CWL reference implementation. Datasets, curated at NCBI and used for prokaryotic annotation, are also distributed with the tool.
Provide some basic information and the fasta files for your genome of interest, and voila! PGAP will produce an annotation, conforming to what the pipeline internal to NCBI would generate.