NextFlow_RD_Genomic

Tasks

Create the directory data/qsr_vcfs
Populate with https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf
Populate with https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx
Make the modules/BSQR.nf module
Search main.nf for changes (note the channel creation and usage)
Update the nextflow.config

Description

A simple base Rare disease and germline genomics pipeline to test the effects of down-sampling on variant calling

Basic Overview

Using the NextFlow workflow software to run the following pipeline

Pipeline

Index genome > Fastqc analysis > Align reads > Downsample bam files > Sort bam > Mark duplicates > Index bam > Call variants > Hard filter

Setup

To run the pipeline, we need to obtain

A genome build (GRCh38) - provided by the Broad institute

$ cd data/genome
$ wget https://storage.googleapis.com/gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta

FastQ sample (for workflow development)

$ cd ../samples
$ wget https://ftp.sra.ebi.ac.uk/vol1/fastq/SRR151/003/SRR1518253/SRR1518253_1.fastq.gz && \
wget https://ftp.sra.ebi.ac.uk/vol1/fastq/SRR151/003/SRR1518253/SRR1518253_2.fastq.gz && \
gunzip *.gz

When scaling up, FastQ samples

$ wget https://genomics.viapath.co.uk/benchmark/files/FASTQ/NA12878_WES.zip

$ wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz &&
wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz &&
wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/hapmap_3.3.hg38.vcf.gz &&
wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/1000G_omni2.5.hg38.vcf.gz &&
wget https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx &&
wget https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi &&
wget https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi &&
wget https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/hapmap_3.3.hg38.vcf.gz.tbi

Running the pipeline

# Using Docker
$ nextflow run -profile docker main.nf

# Using docker in singularity
$ nextflow run -profile singularity main.nf

Validating the pipeline

See https://genomics.viapath.co.uk/benchmark

DNANexus applet setup (A local applet for basic testing)

DNANexus Python Bindings Documentation
Install the app

pip install -r requirements.txt

Routine maintenance Periodically update dxpy

$ pip install --upgrade dxpy

DNANexus Tutorial

$ dx select <your-project-name>
$ dx build --nextflow

Name	Name	Last commit message	Last commit date
Latest commit Peter-J-Freeman update May 10, 2024 66ff07c · May 10, 2024 History 36 Commits
conf	conf	Add Simone's changes into the Repo	Feb 14, 2024
dockerfiles	dockerfiles	add singularity	Dec 14, 2023
files	files	add structure	Sep 28, 2023
modules	modules	Update mem on DS	Mar 6, 2024
README.md	README.md	add tasks	Feb 29, 2024
build_images.sh	build_images.sh	Add Simone's changes into the Repo. Thanks to Simone Coughlan dnanexus	Feb 14, 2024
docker.md	docker.md	Pipeline that runs in docker	Sep 28, 2023
environment.yml	environment.yml	Push to Repo	Feb 29, 2024
main.nf	main.nf	Tweak to pull vcfs for BQSR directly into the Working dir	Mar 4, 2024
nextflow.config	nextflow.config	update	May 10, 2024
nextflow_schema.json	nextflow_schema.json	Update json schema with vcf qsr path	Mar 1, 2024
requirements.txt	requirements.txt	transfor to Ubuntu	Dec 12, 2023
running_your_nextflow_applet.md	running_your_nextflow_applet.md	Push to Repo	Feb 29, 2024
samplesheet.tsv	samplesheet.tsv	Push to Repo	Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NextFlow_RD_Genomic

Tasks

Description

Basic Overview

Pipeline

Setup

Running the pipeline

Validating the pipeline

DNANexus applet setup (A local applet for basic testing)

DNANexus Tutorial

About

Releases

Packages

Languages

openvar/NextFlow_RD_Genomic_QR

Folders and files

Latest commit

History

Repository files navigation

NextFlow_RD_Genomic

Tasks

Description

Basic Overview

Pipeline

Setup

Running the pipeline

Validating the pipeline

DNANexus applet setup (A local applet for basic testing)

DNANexus Tutorial

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages