Skip to content

BfArM-MVH/GRZ_QC_Workflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub Actions CI Status nf-test

Nextflow run with conda run with docker run with singularity

Introduction

BfArM-MVH/GRZ_QC_Workflow performs extended quality control of GRZ submissions according to the defined thresholds.

  1. Read QC (FastQC and FASTP)
  2. Alignment using (BWAMEM2)
  3. Coverage calculation by (Mosdepth)
  4. Present QC for raw reads (MultiQC)

Setup

  • Install nextflow (and dependencies)
  • Make sure to have either conda, docker or singularity.
  • Clone the github repository
git clone https://github.com/BfArM-MVH/GRZ_QC_Workflow.git
$output_path = "path/to/analysis/dir"

Setting up reference files

This pipeline will automatically download the necessary reference genomes and creates an BWA index from them. However, when running this pipeline multiple times on different submissions, the download and indexing steps create unnecessary overhead.

To skip downloading the reference genomes, you can also download the necessary reference genome FASTA files to some shared location:

wget https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz
mv hg19.fa.gz $shared_directory/references
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
mv hg38.fa.gz $shared_directory/references

Then you can update the file paths in conf/grzqc.conf:

params {
    [...]
    fasta_37 = "$shared_directory/references/hg19.fa.gz"
    fasta_38 = "$shared_directory/references/hg38.fa.gz"
}

by replacing $shared_directory with the absolute path to the shared directory.

After the first run, you can also copy the BWAMEM2 index to the shared directory:

cp -r "${output_basepath}/grzqc_output/references/" "$shared_directory/references/"

and configure it in conf/grzqc.conf:

params {
    [...]
    bwa_index_37 = "$shared_directory/references/GRCh37/bwamem2"
    bwa_index_38 = "$shared_directory/references/GRCh38/bwamem2"

}

by replacing $shared_directory with the absolute path to the shared directory.

Usage

This pipeline needs a samplesheet which is generated automatically from the metadata.json file included in the submission base directory. Please make sure that the submission base directory has the required folder structure. The script run_grzqc.sh parses the metadata.json file to create a nextflow samplesheet:

python3 bin/metadata_to_samplesheet.py \
    "${submission_basepath}" \
    "${output_basepath}/grzqc_output/grzqc_samplesheet.csv"

Now, you can run the pipeline using:

nextflow run main.nf \
    -profile grzqc,conda \
    --outdir "${output_basepath}/grzqc_output/" \
    -work-dir "${output_basepath}/work/" \
    --input "${output_basepath}/grzqc_output/grzqc_samplesheet.csv" \
    -resume

For your next run, you can use prebuild references. Please prepare your own config file to do so.

Pipeline output

Output :

Column Description
sampleId Sample ID
labDataName Lab data name
libraryType Library type, e.g., wes for whole-exome sequencing
sequenceSubtype Sequence subtype, e.g., somatic or germline
genomicStudySubtype Genomic study subtype, e.g., tumor+germline
meanDepthOfCoverage Mean depth of coverage
meanDepthOfCoverageRequired Mean depth of coverage required to pass QC
fractionBasesAboveQualityThreshold Fraction of bases passing the quality threshold
qualityThreshold The quality threshold to pass
fractionBasesAboveQualityThresholdRequired Fraction of bases above the quality threshold required to pass QC
targetedRegionsAboveMinCoverage Fraction of targeted regions above minimum coverage
minCoverage Minimum coverage for target regions
targetedRegionsAboveMinCoverageRequired Fraction of targeted regions above minimum coverage required to pass QC
passedQC true when QC passed, otherwise false

Contributions and Support

BfArM-MVH/GRZ_QC_Workflow was originally written by Shounak Chakraborty, Yun Wang, Kübra Narci and Florian R. Hölzlwimmer.