cbioportal-vcf2maf-pipeline

A pipeline, adapted from Genome Nexus annotation tools pipeline, to convert VCF files with somatic mutations into a single MAF for import into cBioPortal.

Note that the pipeline assumes that each input VCF file has only one column of sample data, and that that column is assumed to contain somatic mutationso only.

The Genome Nexus annotation tools repo has a pipeline script, but doesn't seem to work that well with our data (the annotator fails annotating more mutations than it should).

The pipeline script follows the steps:

converts the vcfs to mafs with annotation-tools/vcf2maf.py
merges the mafs into a single one with annotation-tools/merge_mafs.py
reduces the merged maf to a minimal maf using cut (grabbing columns as per instructions on cBioPortal documentation)
annotate the resultant minimal maf with the genome nexus annotation pipeline to obtain the final maf.
- the final maf will meet the requirements for cBioPortal as per the cBioPortal documentation.

Setup

Prerequisites: Python 3.6, Maven, Java

Get the repo

git clone --recursive https://github.com/WEHI-ResearchComputing/cbioportal-vcf2maf-pipeline.git```
cd cbioportal-vcf2maf-pipeline

Install Python prerequisites

pip install -r annotation-tools/requirements.txt

Build pipeline jar

Copy and modify the example config files.

cd genome-nexus-annotation-pipeline
cp annotationPipeline/src/main/resources/application.properties.EXAMPLE annotationPipeline/src/main/resources/application.properties
cp annotationPipeline/src/main/resources/log4j.properties.EXAMPLE annotationPipeline/src/main/resources/log4j.properties

You will want to edit the log4j.properties file to control where logs are written (edit the log4j.appender.a.File field).

If you wish to use the Genome Nexus GRCh38 (apparently experimental) database, change the genomenexus.base field from https://www.genomenexus.org to https://grch38.genomenexus.org.

Build the jar file

mvn clean install

This will create the jar file in genome-nexus-annotation-pipeline/annotationPipeline

Usage

./pipeline.sh -i=<dir> -o=<dir> -p=<dir> -j=<jar> -c=center [-t=<dir>] [-e=error.log]
        -i | --input-directory               input data directory for processing somatic mutation data files [REQUIRED]
        -o | --output-directory              output directory to write processed and annotated MAF to [REQUIRED]
        -p | --annotation-scripts-home       path to the annotation suite scripts directory [REQUIRED]
        -j | --annotation-pipeline-jar       path to the annotation pipeline jar [REQUIRED]
        -c | --center-name                   center name to be used in Center MAF field [REQUIRED]
        -c | --isoform-override              Isoform Overrides - mskcc or uniprot [REQUIRED]
        -t | --intermediate-files-directory  path to store intermediary files. Default is directory created with mktemp
        -e | --annotation-error-log          path to store annotation pipeline error log. Default is ./error.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

cbioportal-vcf2maf-pipeline

Setup

Get the repo

Install Python prerequisites

Build pipeline jar

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

cbioportal-vcf2maf-pipeline

Setup

Get the repo

Install Python prerequisites

Build pipeline jar

Usage