Skip to content

openvar/NextFlow_RD_Genomic_QR

This branch is 10 commits ahead of, 11 commits behind Peter-J-Freeman/NextFlow_RD_Genomic:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

66ff07c · May 10, 2024

History

36 Commits
Feb 14, 2024
Dec 14, 2023
Sep 28, 2023
Mar 6, 2024
Feb 29, 2024
Feb 14, 2024
Sep 28, 2023
Feb 29, 2024
Mar 4, 2024
May 10, 2024
Mar 1, 2024
Dec 12, 2023
Feb 29, 2024
Feb 29, 2024

Repository files navigation

NextFlow_RD_Genomic

Tasks

Description

A simple base Rare disease and germline genomics pipeline to test the effects of down-sampling on variant calling

Basic Overview

Using the NextFlow workflow software to run the following pipeline

Pipeline

Index genome > Fastqc analysis > Align reads > Downsample bam files > Sort bam > Mark duplicates > Index bam > Call variants > Hard filter

Setup

To run the pipeline, we need to obtain

  • A genome build (GRCh38) - provided by the Broad institute
$ cd data/genome
$ wget https://storage.googleapis.com/gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta
  • FastQ sample (for workflow development)
$ cd ../samples
$ wget https://ftp.sra.ebi.ac.uk/vol1/fastq/SRR151/003/SRR1518253/SRR1518253_1.fastq.gz && \
wget https://ftp.sra.ebi.ac.uk/vol1/fastq/SRR151/003/SRR1518253/SRR1518253_2.fastq.gz && \
gunzip *.gz
  • When scaling up, FastQ samples
$ wget https://genomics.viapath.co.uk/benchmark/files/FASTQ/NA12878_WES.zip
$ wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz &&
wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz &&
wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/hapmap_3.3.hg38.vcf.gz &&
wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/1000G_omni2.5.hg38.vcf.gz &&
wget https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx &&
wget https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi &&
wget https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi &&
wget https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/hapmap_3.3.hg38.vcf.gz.tbi

Running the pipeline

# Using Docker
$ nextflow run -profile docker main.nf

# Using docker in singularity
$ nextflow run -profile singularity main.nf

Validating the pipeline

See https://genomics.viapath.co.uk/benchmark

DNANexus applet setup (A local applet for basic testing)

pip install -r requirements.txt
  • Routine maintenance Periodically update dxpy
$ pip install --upgrade dxpy

DNANexus Tutorial

$ dx select <your-project-name>
$ dx build --nextflow

About

NextFlow pipeline with BQSR and VQSR applied

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Nextflow 83.6%
  • Dockerfile 15.2%
  • Shell 1.2%