Structural variation analysis pipeline for WGS of cancer genomes

Description

This workflow will run the SvABA structural variation (SV) analysis for a set of tumor-normal pairs, starting from the aligned BAM files. The analysis includes structural variant prediction and and determine SV classes. SV classes are predicted by combining SV results with copy number results previously generated by TitanCNA.

Contact

Gavin Ha
Fred Hutchinson Cancer Research Center
contact: [email protected] or [email protected]
Date: May 15, 2019

Requirements

Software packages or libraries

R-3.4 -optparse
-data.table
-GenomicRanges
-GenomeInfoDb
-VariantAnnotation
-plyr
-ggplot2
-reshape2
-stringr
Python 3.4
- snakemake-5.4
samtools-1.3.1

Tumour-Normal sample list

The list of tumour-normal paired samples should be defined in a YAML file. See config/samples.yaml for an example. Both fields samples and pairings must to be provided. pairings key must match the tumour sample while the value must match the normal sample.

samples:
  tumor_sample_1:  /path/to/bam/tumor.bam
  normal_sample_1:  /path/to/bam/normal.bam


pairings:
  tumor_sample_1:  normal_sample_1

Running the analysis

1. Invoking the snakemake workflow for SvABA on a local machine

The first snakemake file svaba.snakefile will a. Run the full SvABA analysis

# show commands and workflow
snakemake -s svaba.snakefile -np
# run the workflow locally using 5 cores
snakemake -s svaba.snakefile --cores 5

2. Integrating SV and copy number results (TitanCNA)

The second snakemake file combineSvabaTitan.snakefile will a. Combine SvABA and TitanCNA results to assign copy number to the breakpoints and rearrangement classes to the SV events. b. Plot copy number and SV for each chromosome.

snakemake -s combineSvabaTitan.snakefile --cores 5

The third snakemake file plotSVandCNAzoom.snakefile generates custom plots zoomed in for a region of interest. Users can specify the coordinates in the configPlotZoom.yaml file. See the description of the configuration below.

snakemake -s plotSVandCNAzoom.snakefile --cores 5

3. Invoking the snakemake workflow for SvABA on a cluster

If you are using a cluster, then these are resource settings for memory and runtime limits, and parallel environments.
These are the default settings for all tasks that do not have rule-specific resources.
There are two cluster configurations provided: qsub and slurm

i. `qsub`

There are 2 separate files in use for qsub, which are provided as a template: config/cluster_qsub.sh - This file contains other qsub parameters. Note that these settings are used for the Broad's UGER cluster so users will need to modify this for their own clusters.
config/cluster_qsub.yaml - This file contains the memory, runtime, and number of cores for certain tasks.

To invoke the snakemake pipeline for qsub:

snakemake -s svaba.snakefile --jobscript config/cluster_qsub.sh --cluster-config config/cluster_qsub.yaml --cluster-sync "qsub -l h_vmem={cluster.h_vmem},h_rt={cluster.h_rt} -pe {cluster.pe} -binding {cluster.binding}" -j 100

Here, the h_vmem (max memory), h_rt (max runtime) are used. For runSvaba task, the default setting is to use 4 cores and this can be set with -pe and -binding. Your SGE settings may be different and users should adjust accordingly.

ii. `slurm`

There is only one file in use for slurm: config/cluster_slurm.yaml - This file contains the memory, runtime, and number of cores for certain tasks. To invoke the snakemake pipeline for slurm:

snakemake -s svaba.snakefile --cluster-config config/cluster_slurm.yaml --cluster "sbatch -p {cluster.partition} --mem={cluster.mem} -t {cluster.time} -c {cluster.ncpus} -n {cluster.ntasks} -o {cluster.output}" -j 50

Configuration and settings

All settings for the workflow are contained in config/config.yaml. The settings are organized by paths to scripts and reference files and then by each step in the workflow.

1. Path to tools

svaba_exe is the compiled SvABA executable

svaba_exe:  /path/to/svaba
samTools:  /path/to/samtools

2. Path to scripts

These are provided in this repo under code/.

svaba_funcs:  code/svaba_utils.R
plot_funcs:  code/plotting.R
combineSVCN_script:  code/combineSVABAandTITAN.R
plotSVCN_script:  code/plotTitanSvaba.R

3. Path to R package files

Specify the directory in which TitanCNA are installed.
Set these if the R files in these libraries have been modified or updated but not yet installed or updated in R.

titan_libdir:  /path/to/TitanCNA/

4. Path to TitanCNA snakemake results

Specifies the TitanCNA snakemake location containing result files to be merged with SvABA results in combineSvabaTitan.snakefile.

titan_results:  /path/to/TitanCNA/snakemake_results/

5. Reference files and settings

Global reference files used by the snakefiles and scripts.

refGenome specify the reference genome used for aligning the sequenced reads
genomeStyle specifies the chromosome naming convention to used for output files. Input files can be any convention as long as it is the same genome build. Only use UCSC (e.g. chr1) or NCBI (e.g. 1).
cytobandFile is used for plotting the chromosome idiograms and only needs to specify data/cytoBand_hg38.txt if using hg38.
chrs specifies the chromosomes to analyze; users do not need to be concerned about chromosome naming convention here as the code will handle it based on the genomeStyle.

refGenome:  /path/to/ref/genome.fasta
genomeBuild:  hg38
genomeStyle:  UCSC
cytobandFile:  data/cytoBand_hg38.txt # only need if hg38
chrs:  c(1:22, \"X\")

6. svaba.snakefile settings: SvABA

svaba_dbSNPindelVCF specifies a VCF file containing known indels to use for germline filtering Homo_sapiens_assembly38.known_indels.vcf.gz

svaba_dbSNPindelVCF:  /path/to/dbsnp_indel.vcf
svaba_numThreads:  4  # should match cluster settings for number of cores

7. combineSvabaTitan.snakefile settings: Plotting

Settings used for plotting copy number and SV results for all chromosomes. Note that this differs from the custom user zoom plots using plotSVandCNAzoom.snakefile.

plot_geneFile is a text file listing the regions to annotate in the plot. 4 column file: name, chr, start, stop.

plot_zoom:  FALSE
plot_chrs:  c(1:22, \"X\") # "None" will also plot all chromosomes
plot_startPos:  None
plot_endPos:  None
plot_geneFile:  data/AR_coord.txt ## include list of genes to annotate on plot
plot_ylim:  c(-2,6)
plot_size:  c(8,4)
plot_type:  titan ## titan - will include haplotype fraction
plot_format:  png

8. plotSVandCNAzoom.snakefile settings in configPlotZoom.yaml

plot_id to use for naming the output directory containing the zoomed plots for each sample.
plot_zoom indicates the plot should be focused on a specific region smaller than a whole chromosome. If set to TRUE, then plot_chrs (should only be a single chr), plot_startPos, plot_endPos should be set.
'plot_typeis set toichorfor total copy number only (black dots). Set it totitan` to also include the allelic fraction panel as a second track.

plot_id: AR_enhancer_zoom
plot_zoom:  TRUE
plot_chrs:  c(\"X\") 
plot_startPos:  66500000
plot_endPos:  67900000
plot_geneFile:  data/AR_coord.txt ## include list of genes to annotate on plot
plot_ylim:  c(-2,6)
plot_size:  c(8,4)
plot_type:  ichor ## use "titan" to also plot the allelic fraction panel 
plot_format:  png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Structural variation analysis pipeline for WGS of cancer genomes

Description

Contact

Requirements

Software packages or libraries

Tumour-Normal sample list

Running the analysis

1. Invoking the snakemake workflow for SvABA on a local machine

2. Integrating SV and copy number results (TitanCNA)

3. Invoking the snakemake workflow for SvABA on a cluster

i. `qsub`

ii. `slurm`

Configuration and settings

1. Path to tools

2. Path to scripts

3. Path to R package files

4. Path to TitanCNA snakemake results

5. Reference files and settings

6. svaba.snakefile settings: SvABA

7. combineSvabaTitan.snakefile settings: Plotting

8. plotSVandCNAzoom.snakefile settings in configPlotZoom.yaml

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
code		code
config		config
data		data
LICENSE		LICENSE
README.md		README.md
combineSvabaTitan.snakefile		combineSvabaTitan.snakefile
plotSVandCNAzoom.snakefile		plotSVandCNAzoom.snakefile
svaba.snakefile		svaba.snakefile

License

GavinHaLab/SV_WGS_analysis

Folders and files

Latest commit

History

Repository files navigation

Structural variation analysis pipeline for WGS of cancer genomes

Description

Contact

Requirements

Software packages or libraries

Tumour-Normal sample list

Running the analysis

1. Invoking the snakemake workflow for SvABA on a local machine

2. Integrating SV and copy number results (TitanCNA)

3. Invoking the snakemake workflow for SvABA on a cluster

i. qsub

ii. slurm

Configuration and settings

1. Path to tools

2. Path to scripts

3. Path to R package files

4. Path to TitanCNA snakemake results

5. Reference files and settings

6. svaba.snakefile settings: SvABA

7. combineSvabaTitan.snakefile settings: Plotting

8. plotSVandCNAzoom.snakefile settings in configPlotZoom.yaml

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

i. `qsub`

ii. `slurm`

Packages