GitHub - BfArM-MVH/GRZ_QC_Workflow

Introduction

BfArM-MVH/GRZ_QC_Workflow performs extended quality control of GRZ submissions according to the defined thresholds.

Read QC (FastQC and FASTP)
Alignment using (BWAMEM2)
Coverage calculation by (Mosdepth)
Present QC for raw reads (MultiQC)

Setup

Install nextflow (and dependencies)
Make sure to have either conda, docker or singularity.
Clone the github repository

git clone https://github.com/BfArM-MVH/GRZ_QC_Workflow.git
$output_path = "path/to/analysis/dir"

Setting up reference files

This pipeline will automatically download the necessary reference genomes and creates an BWA index from them. However, when running this pipeline multiple times on different submissions, the download and indexing steps create unnecessary overhead.

To skip downloading the reference genomes, you can also download the necessary reference genome FASTA files to some shared location:

wget https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz
mv hg19.fa.gz $shared_directory/references
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
mv hg38.fa.gz $shared_directory/references

Then you can update the file paths in conf/grzqc.conf:

params {
    [...]
    fasta_37 = "$shared_directory/references/hg19.fa.gz"
    fasta_38 = "$shared_directory/references/hg38.fa.gz"
}

by replacing $shared_directory with the absolute path to the shared directory.

After the first run, you can also copy the BWAMEM2 index to the shared directory:

cp -r "${output_basepath}/grzqc_output/references/" "$shared_directory/references/"

and configure it in conf/grzqc.conf:

params {
    [...]
    bwa_index_37 = "$shared_directory/references/GRCh37/bwamem2"
    bwa_index_38 = "$shared_directory/references/GRCh38/bwamem2"

}

by replacing $shared_directory with the absolute path to the shared directory.

Usage

This pipeline needs a samplesheet which is generated automatically from the metadata.json file included in the submission base directory. Please make sure that the submission base directory has the required folder structure. The script run_grzqc.sh parses the metadata.json file to create a nextflow samplesheet:

python3 bin/metadata_to_samplesheet.py \
    "${submission_basepath}" \
    "${output_basepath}/grzqc_output/grzqc_samplesheet.csv"

Now, you can run the pipeline using:

nextflow run main.nf \
    -profile grzqc,conda \
    --outdir "${output_basepath}/grzqc_output/" \
    -work-dir "${output_basepath}/work/" \
    --input "${output_basepath}/grzqc_output/grzqc_samplesheet.csv" \
    -resume

For your next run, you can use prebuild references. Please prepare your own config file to do so.

Pipeline output

Output :

Column	Description
`sampleId`	Sample ID
`labDataName`	Lab data name
`libraryType`	Library type, e.g., `wes` for whole-exome sequencing
`sequenceSubtype`	Sequence subtype, e.g., `somatic` or `germline`
`genomicStudySubtype`	Genomic study subtype, e.g., `tumor+germline`
`meanDepthOfCoverage`	Mean depth of coverage
`meanDepthOfCoverageRequired`	Mean depth of coverage required to pass QC
`fractionBasesAboveQualityThreshold`	Fraction of bases passing the quality threshold
`qualityThreshold`	The quality threshold to pass
`fractionBasesAboveQualityThresholdRequired`	Fraction of bases above the quality threshold required to pass QC
`targetedRegionsAboveMinCoverage`	Fraction of targeted regions above minimum coverage
`minCoverage`	Minimum coverage for target regions
`targetedRegionsAboveMinCoverageRequired`	Fraction of targeted regions above minimum coverage required to pass QC
`passedQC`	`true` when QC passed, otherwise `false`

Contributions and Support

BfArM-MVH/GRZ_QC_Workflow was originally written by Shounak Chakraborty, Yun Wang, Kübra Narci and Florian R. Hölzlwimmer.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.devcontainer		.devcontainer
assets		assets
bin		bin
conf		conf
docs		docs
modules		modules
src/python/grz_qc		src/python/grz_qc
subworkflows		subworkflows
tests		tests
workflows		workflows
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
.nf-core.yml		.nf-core.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
.prettierrc.yml		.prettierrc.yml
CHANGELOG.md		CHANGELOG.md
CITATIONS.md		CITATIONS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
environment-dev.yaml		environment-dev.yaml
main.nf		main.nf
modules.json		modules.json
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
nf-test.config		nf-test.config
pytest.ini		pytest.ini
run_grzqc.sh		run_grzqc.sh
tower.yml		tower.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Setup

Setting up reference files

Usage

Pipeline output

Contributions and Support

About

Releases

Packages

Contributors 5

Languages

License

BfArM-MVH/GRZ_QC_Workflow

Folders and files

Latest commit

History

Repository files navigation

Introduction

Setup

Setting up reference files

Usage

Pipeline output

Contributions and Support

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages