diff --git a/.gitignore b/.gitignore index 5203cf5..8e641be 100755 --- a/.gitignore +++ b/.gitignore @@ -33,4 +33,6 @@ create_pgx_samplesheet.sh # Other files bin/report_template.txt bin/snakemake_report.py -bin/pdf.py \ No newline at end of file +bin/pdf.py + +subworkflows/local/pharmacoGenomics.nf.backup \ No newline at end of file diff --git a/CHANGELOG.md b/CHANGELOG.md index a29e004..577e467 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,8 @@ +# v2.0.0 +- Major change to the process flow +- Updated pharmact to v2.12.0 +- Updated ReadMe.md + # v1.1.1 - Updated QC text in the report diff --git a/README.md b/README.md index 6c9a396..d8b161c 100755 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@
-[![Nextflow DSL2](https://img.shields.io/badge/NextFlow_DSL2-23.04.0-23.svg)](https://www.nextflow.io/docs/latest/dsl2.html) [![Singularity Version](https://img.shields.io/badge/Singularity-%E2%89%A53.8.0-orange)](https://sylabs.io/docs/) [![Run with Singularity](https://img.shields.io/badge/Run%20with-Singularity-orange)](https://sylabs.io/docs/) +[![Nextflow DSL2](https://img.shields.io/badge/NextFlow_DSL2-23.04.0-23.svg)](https://www.nextflow.io/docs/latest/dsl2.html) [![Singularity Version](https://img.shields.io/badge/Singularity-%E2%89%A53.8.0-orange)](https://sylabs.io/docs/) [![PharmCat Version](https://img.shields.io/badge/PhamCat-2.12.0-green)](https://sylabs.io/docs/) [![Run with Singularity](https://img.shields.io/badge/Run%20with-Singularity-orange)](https://sylabs.io/docs/) [![PharmGKB](https://img.shields.io/badge/PharmGKB-blue)](https://www.pharmgkb.org/) [![CPIC](https://img.shields.io/badge/CPIC-green)](https://cpicpgx.org/) [![PharmVar](https://img.shields.io/badge/PharmVar-yellow)](https://www.pharmvar.org/) [![PharmCAT](https://img.shields.io/badge/Support_for-PharmCAT-orange)](https://pharmcat.org/) @@ -13,7 +13,8 @@ Welcome to PGxModule: Revolutionizing Genomic Medicine! -PGxModule is an advanced Nextflow DSL2 workflow, designed to seamlessly integrate into your genomics pipeline. It empowers you to generate sample-specific reports with clinical guidelines, leveraging state-of-the-art variant detection in Genomic Medicine Sweden sequencing panels. This workflow is inspired by JoelAAs. +PGxModule is an advanced Nextflow DSL2 workflow, designed to seamlessly integrate into your genomics pipeline. It empowers you to generate sample-specific reports with clinical guidelines, leveraging state-of-the-art variant detection in Genomic Medicine Sweden sequencing panels. This workflow is inspired by JoelAAs. Besides, we have also implemented [Pharmcat](https://pharmcat.org/) Pharmacogenomics Clinical Annotation Tool +report with this pipeline, where we get recommendataion for all the detected haplotyopes directly from the CPIC. ### Key Features: @@ -24,7 +25,7 @@ PGxModule is an advanced Nextflow DSL2 workflow, designed to seamlessly integrat ## Pipeline Summary -The pipeline focuses on 19 SNPs from TPMT, DPYD, and NUDT15 genes, with plans to incorporate additional genes in future updates. The target selection is meticulously curated from reputable databases such as [PharmGKB](https://www.pharmgkb.org/) and [PharmVar](https://www.pharmvar.org/), guided by [CPIC](https://cpicpgx.org/) recommendations. As the pipeline evolves, it aims to broaden its scope, providing a more comprehensive analysis of pharmacogenomic variations to enhance clinical insights. +This pipeline branches into two analysis, one which only focuses on 19 SNPs from TPMT, DPYD, and NUDT15 genes, with plans to incorporate additional genes in future updates. The second part of the analysis is by using an external tool [Pharmcat](https://pharmcat.org/) developed by [PharmGKB](https://www.pharmgkb.org/) where we try to find as many haplotypes as we can without subsetting the original bam, these haplotypes are then annotated and reported along with the clinical recommendations. The target selection is meticulously curated from reputable databases such as [PharmGKB](https://www.pharmgkb.org/) and [PharmVar](https://www.pharmvar.org/), guided by [CPIC](https://cpicpgx.org/) recommendations. As the pipeline evolves, it aims to broaden its scope, providing a more comprehensive analysis of pharmacogenomic variations to enhance clinical insights. ## Pipeline Steps @@ -33,30 +34,35 @@ The PGxModule pipeline was executed with Nextflow version 23.04.2. The pipeline 1. **CSV Validation** The CSV Validation step ensures the correctness and integrity of the input CSV file. It checks for proper formatting, required fields, and data consistency, providing a foundation for accurate downstream processing in the PGxModule pipeline. -2. **Getting Ontarget Bam** -This step involves extracting the on-target BAM files from the analyzed samples. These BAM files specifically capture the sequencing data aligned to the regions of interest, enabling reduction in time and focused analysis on the genomic regions relevant to the pharmacogenomic study. -3. **Haplotype Calling** +2. **Haplotype Calling** Haplotype Calling is a crucial stage where the pipeline identifies and assembles haplotypes from the sequencing data. This process is fundamental in characterizing the genetic variations present in the samples, laying the groundwork for subsequent analyses and variant interpretation. -4. **Haplotype Annotation** -Haplotypes which are called are annotated with dbSNP ids. -5. **Haplotype Filtration** +3. **Haplotype Filtration** Haplotype Filtration focuses on refining the set of identified haplotypes, applying specific criteria to select variants of interest and discard noise. This process enhances the precision of the haplotype dataset, ensuring that downstream analyses are based on high-quality and clinically relevant variants. -6. **Coverage Analysis** -Coverage Analysis evaluates the sequencing depth across targeted regions, providing insights into the reliability of variant calls. By assessing coverage, this step identifies regions with insufficient data and informs the overall confidence in the accuracy of the genomic information obtained from the samples. -7. **Detection of variants** +4. **PharmCat Preprocessing** +A script to preprocess VCF files for PharmCAT, ensuring compliance with VCF v4.2, stripping irrelevant PGx positions, normalizing variants, and optionally filtering sample data. +5. **PharmCat** +This scripts helps us to match the vcf positions with the pharmaco positions, runs the phenotypes and then finally the pharmcat report with all the recommendations. +6. **Ontarget VCF** +This step involves extracting the on-target VCF posiitons from the analyzed samples. +7. **Haplotype Annotation** +Haplotypes which are called are annotated with dbSNP ids. +8. **Detection of variants** Checking the variants of interest in the whole set of haplotypes and are used for futher analysis -8. **Clinial Recommendations** +9. **Clinial Recommendations** Identified haplotypes are annotated with Haplotype Ids, clincial reccomendations, interaction guidelines based on CPIC. -9. **Report** +10. **Getting Ontarget Bam** +This step involves extracting the on-target BAM files from the analyzed samples. These BAM files specifically capture the sequencing data aligned to the regions of interest, enabling reduction in time and focused analysis on the genomic regions relevant to the pharmacogenomic study. +11. **Coverage Analysis** +Coverage Analysis evaluates the sequencing depth across targeted regions, providing insights into the reliability of variant calls. By assessing coverage, this step identifies regions with insufficient data and informs the overall confidence in the accuracy of the genomic information obtained from the samples. +12. **Report** The Report step consolidates the findings from the preceding analyses into a comprehensive report. This report includes detailed information on detected variants, clinical guidelines, interaction assessments, and other relevant pharmacogenomic insights. It serves as a valuable resource for clinicians and researchers, aiding in informed decision-making based on the genomic characteristics of the analyzed samples. ## Example Input CSV -| clarity_sample_id | id | type | assay | group | bam | bai | purity | -|-------------------|---------|------|-----------|---------|---------------------------------------|---------------------------------------|--------| -| CMD123456 | Sample1 | T | solid-pgx | Sample1 | Sample1.T.bwa.umi.sort.bam | Sample1.T.bwa.umi.sort.bam.bai | 0.30 | -| CMD987654 | Sample2 | T | solid-pgx | Sample2 | Sample2.T.bwa.umi.sort.bam | Sample2.T.bwa.umi.sort.bam.bai | 0.30 | - +| clarity_sample_id | id | type | assay | group | bam | bai | purity | +|-------------------|---------|------|-------------|---------|--------------------------------------|--------------------------------------|--------| +| XXX000001 | Sample1 | T | gmssolidpgx | Sample1 | Sample1.T.bwa.umi.sort.bam | Sample1.T.bwa.umi.sort.bam.bai | 0.30 | +| XXX000002 | Sample2 | T | gmssolidpgx | Sample2 | Sample2.T.bwa.umi.sort.bam | Sample2.T.bwa.umi.sort.bam.bai | 0.30 | ## Setup @@ -111,5 +117,5 @@ nextflow run main.nf --csv /path/to/csv/input.csv -profile "panel,hg38,solid" -- ## Workflow Image -Workflow Image +Workflow Image diff --git a/configs/modules/pharmacogenomics.config b/configs/modules/pharmacogenomics.config index 90cbdd9..37e6562 100644 --- a/configs/modules/pharmacogenomics.config +++ b/configs/modules/pharmacogenomics.config @@ -47,19 +47,6 @@ process { ext.args = { " --target_bed=${params.pgx_target_regions} --padding=${params.padding} --addchr=${params.addchr} "} } - withName: '.*PHARMACO_GENOMICS:ONTARGET_BAM' { - container = "${params.container_dir}/samtools.simg" - - publishDir = [ - path: "${params.outdir}/${params.subdir}/bam/", - mode: 'copy', - overwrite: true, - pattern: "*.bam*" - ] - - ext.args = { " -h -M " } - ext.prefix = { "${meta.group}" } - } withName: '.*PHARMACO_GENOMICS:GATK_HAPLOTYPING' { container = "${params.container_dir}/gatk4.simg" @@ -77,7 +64,7 @@ process { } withName: '.*PHARMACO_GENOMICS:SENTIEON_HAPLOTYPING' { - container = "${params.container_dir}/sentieon_202112.sif" + container = "${params.container_dir}/sentieon_202308.01.sif" publishDir = [ path: "${params.outdir}/${params.subdir}/vcf/sentieon", @@ -92,28 +79,14 @@ process { ext.when = { params.haplotype_caller == 'SENTIEON' } } - withName: '.*PHARMACO_GENOMICS:BCFTOOLS_ANNOTATION' { - container = "${params.container_dir}/bcftools.sif" - - publishDir = [ - path: "${params.outdir}/${params.subdir}/vcf/", - mode: 'copy', - overwrite: true, - pattern: "*.vcf" - ] - - ext.args = { "-a ${params.dbSNP} -c ID" } - ext.prefix = { "${meta.group}" } - } - withName: '.*PHARMACO_GENOMICS:VARIANT_FILTRATION' { - container = "${params.container_dir}/target_variants_python.simg" + container = "${params.container_dir}/target_variants_python.sif" publishDir = [ path: "${params.outdir}/${params.subdir}/vcf/", mode: 'copy', overwrite: true, - pattern: "*.vcf" + pattern: "*.filtered.haplotypes.vcf.gz*" ] ext.args = { " --read_ratio=${params.read_ratio} --depth=${params.read_depth} "} @@ -121,7 +94,7 @@ process { } withName: '.*PHARMACO_GENOMICS:PHARMCAT_PREPROCESSING' { - container = "${params.container_dir}/pharmcat_2.9.0.sif" + container = "${params.container_dir}/pharmcat_2.12.0.sif" containerOptions = ' --contain ' publishDir = [ @@ -136,7 +109,7 @@ process { } withName: '.*PHARMACO_GENOMICS:PHARMCAT' { - container = "${params.container_dir}/pharmcat_2.9.0.sif" + container = "${params.container_dir}/pharmcat_2.12.0.sif" containerOptions = ' --contain ' publishDir = [ @@ -156,8 +129,37 @@ process { ext.prefix = { "${meta.group}" } ext.when = params.pharmcat + ext.args = {" --reporter-sources CPIC --matcher-save-html --reporter-save-json -ma -re --reporter-title ${meta.group} "} + } + + withName: '.*PHARMACO_GENOMICS:ONTARGET_VCF' { + container = "${params.container_dir}/bcftools_1.20.sif" + + publishDir = [ + path: "${params.outdir}/${params.subdir}/vcf/", + mode: 'copy', + overwrite: true, + pattern: "*.ontarget.*.vcf.gz*" + ] + + ext.prefix = { "${meta.group}" } } + withName: '.*PHARMACO_GENOMICS:BCFTOOLS_ANNOTATION' { + container = "${params.container_dir}/bcftools_1.20.sif" + + publishDir = [ + path: "${params.outdir}/${params.subdir}/vcf/", + mode: 'copy', + overwrite: true, + pattern: "*.filtered.ontarget.haplotypes.anno.vcf" + ] + + ext.args = { "-a ${params.dbSNP} -c ID" } + ext.prefix = { "${meta.group}" } + } + + withName: '.*PHARMACO_GENOMICS:DETECTED_VARIANTS' { container = "${params.container_dir}/target_variants_python.simg" @@ -186,6 +188,20 @@ process { ext.prefix = { "${meta.group}" } } + withName: '.*PHARMACO_GENOMICS:ONTARGET_BAM' { + container = "${params.container_dir}/samtools.simg" + + publishDir = [ + path: "${params.outdir}/${params.subdir}/bam/", + mode: 'copy', + overwrite: true, + pattern: "*ontarget*.bam*" + ] + + ext.args = { " -h -M " } + ext.prefix = { "${meta.group}" } + } + withName: '.*PHARMACO_GENOMICS:DEPTH_OF_TARGETS' { container = "${params.container_dir}/gatk3.simg" diff --git a/envs/get_cointainers.sh b/envs/get_cointainers.sh index d34b056..3c17649 100755 --- a/envs/get_cointainers.sh +++ b/envs/get_cointainers.sh @@ -9,7 +9,7 @@ sudo singularity build jinja_report.sif recepies/jinja_report sudo singularity build samtools.sif recepies/samtools sudo singularity build gatk3.sif docker://broadinstitute/gatk3:3.8-1 sudo singularity build gatk4.sif docker://broadinstitute/gatk -sudo singularity build bcftools.sif docker://staphb/bcftools +sudo singularity build bcftools_1.20.sif recepies/bcftools sudo singularity build pharmcat.sif docker://pgkb/pharmcat diff --git a/envs/recepies/bcftools b/envs/recepies/bcftools new file mode 100644 index 0000000..d2f3491 --- /dev/null +++ b/envs/recepies/bcftools @@ -0,0 +1,8 @@ +Bootstrap: docker +From: staphb/bcftools:1.20 + +%post + export DEBIAN_FRONTEND=noninteractive + + apt-get -y update + apt-get -y install tabix \ No newline at end of file diff --git a/envs/recepies/get_target_variants b/envs/recepies/get_target_variants index 396620f..500b58c 100755 --- a/envs/recepies/get_target_variants +++ b/envs/recepies/get_target_variants @@ -10,7 +10,8 @@ From: ubuntu:18.04 apt-get -y install zlib1g-dev apt-get -y install libbz2-dev apt-get -y install liblzma-dev + apt-get -y install tabix + apt-get -y install python3-pysam pip3 install pandas==1.0.3 pip3 install argparse==1.4.0 - pip3 install cython - pip3 install pysam==0.15.4 + pip3 install cython \ No newline at end of file diff --git a/envs/recepies/samtools b/envs/recepies/samtools index a7d8ca4..179813f 100755 --- a/envs/recepies/samtools +++ b/envs/recepies/samtools @@ -5,3 +5,4 @@ From: ubuntu:18.04 apt-get -y update apt-get -y install samtools + apt-get -y install tabix diff --git a/modules/local/annotation/main.nf b/modules/local/annotation/main.nf index 316b98f..75901bd 100644 --- a/modules/local/annotation/main.nf +++ b/modules/local/annotation/main.nf @@ -17,7 +17,7 @@ process BCFTOOLS_ANNOTATION { def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.group}" """ - bcftools annotate --threads ${task.cpus} $args -o ${prefix}".haplotypes.anno.vcf" $vcf + bcftools annotate --threads ${task.cpus} $args -o ${prefix}".filtered.ontarget.haplotypes.anno.vcf" $vcf cat <<-END_VERSIONS > versions.yml "${task.process}": @@ -28,7 +28,7 @@ process BCFTOOLS_ANNOTATION { stub: def prefix = task.ext.prefix ?: "${meta.group}" """ - touch ${prefix}".haplotypes.anno.vcf" + touch ${prefix}".filtered.ontarget.haplotypes.anno.vcf" cat <<-END_VERSIONS > versions.yml "${task.process}": diff --git a/modules/local/filtration/main.nf b/modules/local/filtration/main.nf index 79bc9ff..63b3cc3 100755 --- a/modules/local/filtration/main.nf +++ b/modules/local/filtration/main.nf @@ -4,11 +4,11 @@ process VARIANT_FILTRATION { tag "$meta.group" input: - tuple val(group), val(meta), file(vcf) + tuple val(group), val(meta), file(vcf), file(tbi) output: - tuple val(group), val(meta), file("*.filtered.vcf"), emit: haplotypes_filtered - path "versions.yml", emit: versions + tuple val(group), val(meta), file("*.filtered.haplotypes.vcf.gz"), file("*.filtered.haplotypes.vcf.gz.tbi"), emit: haplotypes_filtered + path "versions.yml", emit: versions when: task.ext.when == null || task.ext.when @@ -17,25 +17,33 @@ process VARIANT_FILTRATION { def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.group}" """ + gunzip -c $vcf > ${prefix}.haplotypes.vcf variant_filtration.py \ - --input_vcf=$vcf \ + --input_vcf=${prefix}.haplotypes.vcf \ $args \ - --output_file=${prefix}.haplotypes.anno.filtered.vcf + --output_file=${prefix}.filtered.haplotypes.vcf + + bgzip -c ${prefix}.filtered.haplotypes.vcf > ${prefix}.filtered.haplotypes.vcf.gz + tabix -p vcf ${prefix}.filtered.haplotypes.vcf.gz cat <<-END_VERSIONS > versions.yml "${task.process}": python: \$(python3 --version 2>&1 | sed -e 's/Python //g') + bgzip: \$(bgzip --v | grep 'bgzip' | sed 's/.* //g') + tabix: \$(echo \$(tabix -h 2>&1) | sed 's/^.*Version: //; s/ .*\$//') END_VERSIONS """ stub: def prefix = task.ext.prefix ?: "${meta.group}" """ - touch ${prefix}.haplotypes.anno.filtered.vcf + touch ${prefix}.filtered.haplotypes.vcf.gz ${prefix}.filtered.haplotypes.vcf.gz.tbi cat <<-END_VERSIONS > versions.yml "${task.process}": python: \$(python3 --version 2>&1 | sed -e 's/Python //g') + bgzip: \$(bgzip --v | grep 'bgzip' | sed 's/.* //g') + tabix: \$(echo \$(tabix -h 2>&1) | sed 's/^.*Version: //; s/ .*\$//') END_VERSIONS """ diff --git a/modules/local/haplotyping/main.nf b/modules/local/haplotyping/main.nf new file mode 100644 index 0000000..1b39d53 --- /dev/null +++ b/modules/local/haplotyping/main.nf @@ -0,0 +1,99 @@ +process GATK_HAPLOTYPING { + label 'process_medium_cpus' + label 'process_medium_memory' + label 'stage' + tag "$meta.group" + + input: + tuple val(group), val(meta), file(bam), file(bai) + + output: + tuple val(group), val(meta), file("*.haplotypes.vcf.gz"), file("*.haplotypes.vcf.gz.tbi"), emit: haplotypes + path "versions.yml", emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.group}.GATK" + def avail_mem = 12288 + if (!task.memory) { + log.info '[GATK CollectAllelicCounts] Available memory not known - defaulting to 12GB. Specify process memory requirements to change this.' + } else { + avail_mem = (task.memory.mega*0.8).intValue() + } + """ + gatk --java-options "-Xmx${avail_mem}M" HaplotypeCaller $args -I $bam -O ${prefix}.haplotypes.vcf + bgzip -c ${prefix}.haplotypes.vcf > ${prefix}.haplotypes.vcf.gz + tabix ${prefix}.haplotypes.vcf.gz + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + gatk4: \$(echo \$(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*\$//') + bgzip: \$(bgzip --v | grep 'bgzip' | sed 's/.* //g') + tabix: \$(echo \$(tabix -h 2>&1) | sed 's/^.*Version: //; s/ .*\$//') + END_VERSIONS + """ + + stub: + def prefix = task.ext.prefix ?: "${meta.group}.GATK" + """ + touch ${prefix}.haplotypes.vcf.gz ${prefix}.haplotypes.vcf.gz.tbi + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + gatk4: \$(echo \$(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*\$//') + bgzip: \$(bgzip --v | grep 'bgzip' | sed 's/.* //g') + tabix: \$(echo \$(tabix -h 2>&1) | sed 's/^.*Version: //; s/ .*\$//') + END_VERSIONS + """ +} + + +process SENTIEON_HAPLOTYPING { + label 'process_medium_cpus' + label 'process_medium_memory' + label 'stage' + tag "$meta.group" + + input: + tuple val(group), val(meta), file(bam), file(bai) + + output: + tuple val(group), val(meta), file("*.haplotypes.vcf.gz"), file("*.haplotypes.vcf.gz.tbi"), emit: haplotypes + path "versions.yml", emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def args2 = task.ext.args2 ?: '' + def prefix = task.ext.prefix ?: "${meta.group}.sentieon" + """ + sentieon driver -t ${task.cpus} $args -i $bam --algo Haplotyper $args2 ${prefix}.haplotypes.vcf + bgzip -c ${prefix}.haplotypes.vcf > ${prefix}.haplotypes.vcf.gz + tabix ${prefix}.haplotypes.vcf.gz + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + sentieon: \$(echo \$(sentieon driver --version 2>&1) | sed -e "s/sentieon-genomics-//g") + bgzip: \$(bgzip --v | grep 'bgzip' | sed 's/.* //g') + tabix: \$(echo \$(tabix -h 2>&1) | sed 's/^.*Version: //; s/ .*\$//') + END_VERSIONS + """ + + stub: + def prefix = task.ext.prefix ?: "${meta.group}.sentieon" + """ + touch ${prefix}.haplotypes.vcf.gz ${prefix}.haplotypes.vcf.gz.tbi + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + sentieon: \$(echo \$(sentieon driver --version 2>&1) | sed -e "s/sentieon-genomics-//g") + bgzip: \$(bgzip --v | grep 'bgzip' | sed 's/.* //g') + tabix: \$(echo \$(tabix -h 2>&1) | sed 's/^.*Version: //; s/ .*\$//') + END_VERSIONS + """ +} diff --git a/modules/local/ontarget/main.nf b/modules/local/ontarget/main.nf index db204c5..a3031cc 100755 --- a/modules/local/ontarget/main.nf +++ b/modules/local/ontarget/main.nf @@ -39,101 +39,47 @@ process ONTARGET_BAM { """ } - -process GATK_HAPLOTYPING { - label 'process_high' +process ONTARGET_VCF { + label 'process_low' label 'stage' tag "$meta.group" input: - tuple val(group), val(meta), file(bam), file(bai) + tuple val(group), val(meta), file(vcf), file(tbi) + path pgx_ontarget_padded_bed output: - tuple val(group), val(meta), file("*.haplotypes.vcf.gz"), file("*.haplotypes.vcf.gz.tbi"), emit: haplotypes - path "versions.yml", emit: versions + tuple val(group), val(meta), file("*.ontarget.filtered.haplotypes.vcf.gz"), file("*.ontarget.filtered.haplotypes.vcf.gz.tbi"), emit: vcf_ontarget + path "versions.yml", emit: versions when: task.ext.when == null || task.ext.when script: - def args = task.ext.args ?: '' - def prefix = task.ext.prefix ?: "${meta.group}.GATK" - def avail_mem = 12288 - if (!task.memory) { - log.info '[GATK CollectAllelicCounts] Available memory not known - defaulting to 12GB. Specify process memory requirements to change this.' - } else { - avail_mem = (task.memory.mega*0.8).intValue() - } + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.group}" """ - gatk --java-options "-Xmx${avail_mem}M" HaplotypeCaller $args -I $bam -O ${prefix}.haplotypes.vcf - bgzip -c ${prefix}.haplotypes.vcf > ${prefix}.haplotypes.vcf.gz - tabix ${prefix}.haplotypes.vcf.gz + bcftools view $args -o ${prefix}.ontarget.filtered.haplotypes.vcf -R $pgx_ontarget_padded_bed $vcf + bgzip -c ${prefix}.ontarget.filtered.haplotypes.vcf > ${prefix}.ontarget.filtered.haplotypes.vcf.gz + tabix -p vcf ${prefix}.ontarget.filtered.haplotypes.vcf.gz cat <<-END_VERSIONS > versions.yml "${task.process}": - gatk4: \$(echo \$(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*\$//') - bgzip: \$(bgzip --v | grep 'bgzip' | sed 's/.* //g') - tabix: \$(echo \$(tabix -h 2>&1) | sed 's/^.*Version: //; s/ .*\$//') + bcftools: \$(bcftools --version | grep 'bcftools' 2>&1 | sed 's/^.*bcftools //') END_VERSIONS """ stub: - def prefix = task.ext.prefix ?: "${meta.group}.GATK" + def prefix = task.ext.prefix ?: "${meta.group}" """ - touch ${prefix}.haplotypes.vcf.gz ${prefix}.haplotypes.vcf.gz.tbi + touch ${prefix}.ontarget.filtered.haplotypes.vcf.gz + touch ${prefix}.ontarget.filtered.haplotypes.vcf.gz.tbi cat <<-END_VERSIONS > versions.yml "${task.process}": - gatk4: \$(echo \$(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*\$//') - bgzip: \$(bgzip --v | grep 'bgzip' | sed 's/.* //g') - tabix: \$(echo \$(tabix -h 2>&1) | sed 's/^.*Version: //; s/ .*\$//') + bcftools: \$(bcftools --version | grep 'bcftools' 2>&1 | sed 's/^.*bcftools //') END_VERSIONS """ } -process SENTIEON_HAPLOTYPING { - label 'process_high' - label 'stage' - tag "$meta.group" - - input: - tuple val(group), val(meta), file(bam), file(bai) - - output: - tuple val(group), val(meta), file("*.haplotypes.vcf.gz"), file("*.haplotypes.vcf.gz.tbi"), emit: haplotypes - path "versions.yml", emit: versions - - when: - task.ext.when == null || task.ext.when - - script: - def args = task.ext.args ?: '' - def args2 = task.ext.args2 ?: '' - def prefix = task.ext.prefix ?: "${meta.group}.sentieon" - """ - sentieon driver -t ${task.cpus} $args -i $bam --algo Haplotyper $args2 ${prefix}.haplotypes.vcf - bgzip -c ${prefix}.haplotypes.vcf > ${prefix}.haplotypes.vcf.gz - tabix ${prefix}.haplotypes.vcf.gz - - cat <<-END_VERSIONS > versions.yml - "${task.process}": - sentieon: \$(echo \$(sentieon driver --version 2>&1) | sed -e "s/sentieon-genomics-//g") - bgzip: \$(bgzip --v | grep 'bgzip' | sed 's/.* //g') - tabix: \$(echo \$(tabix -h 2>&1) | sed 's/^.*Version: //; s/ .*\$//') - END_VERSIONS - """ - - stub: - def prefix = task.ext.prefix ?: "${meta.group}.sentieon" - """ - touch ${prefix}.haplotypes.vcf.gz ${prefix}.haplotypes.vcf.gz.tbi - - cat <<-END_VERSIONS > versions.yml - "${task.process}": - sentieon: \$(echo \$(sentieon driver --version 2>&1) | sed -e "s/sentieon-genomics-//g") - bgzip: \$(bgzip --v | grep 'bgzip' | sed 's/.* //g') - tabix: \$(echo \$(tabix -h 2>&1) | sed 's/^.*Version: //; s/ .*\$//') - END_VERSIONS - """ -} diff --git a/modules/local/pharmcat/main.nf b/modules/local/pharmcat/main.nf index 5a3f2c7..f20ad9c 100644 --- a/modules/local/pharmcat/main.nf +++ b/modules/local/pharmcat/main.nf @@ -4,11 +4,13 @@ process PHARMCAT_PREPROCESSING { tag "$meta.group" input: - tuple val(group), val(meta), file(vcf) + tuple val(group), val(meta), file(vcf), file(tbi) output: - tuple val(group), val(meta), file("*.pharmcat.preprocessed.vcf"), emit: pharmcat_preprocessed - path "versions.yml", emit: versions + tuple val(group), val(meta), file("*.pharmcat.preprocessed.vcf"), emit: pharmcat_preprocessed_vcf + // tuple val(group), val(meta), file("*.pharmcat.pgx_regions.vcf.bgz"), file("*.pharmcat.pgx_regions.vcf.bgz.csi"), emit: pharmcat_pgx_regions + // tuple val(group), val(meta), file("*.pharmcat.normalized.vcf.bgz"), file("*.pharmcat.normalized.vcf.bgz.csi"), emit: pharmcat_normalized + path "versions.yml", emit: versions when: task.ext.when == null || task.ext.when @@ -17,12 +19,12 @@ process PHARMCAT_PREPROCESSING { def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.group}" """ - /pharmcat/pharmcat_vcf_preprocessor.py -vcf $vcf --base-filename ${prefix}.pharmcat $args + pharmcat_vcf_preprocessor.py -vcf $vcf --base-filename ${prefix}.pharmcat $args gunzip -c ${prefix}.pharmcat.preprocessed.vcf.bgz > ${prefix}.pharmcat.preprocessed.vcf cat <<-END_VERSIONS > versions.yml "${task.process}": - pharmcat_vcf_preprocessor: \$(/pharmcat/pharmcat_vcf_preprocessor.py -V 2>&1 | sed -e 's/PharmCAT VCF Preprocessor //g') + pharmcat_vcf_preprocessor: \$(pharmcat_vcf_preprocessor.py -V 2>&1 | sed -e 's/PharmCAT VCF Preprocessor //g') END_VERSIONS """ @@ -33,7 +35,7 @@ process PHARMCAT_PREPROCESSING { cat <<-END_VERSIONS > versions.yml "${task.process}": - pharmcat_vcf_preprocessor: \$(/pharmcat/pharmcat_vcf_preprocessor.py -V 2>&1 | sed -e 's/PharmCAT VCF Preprocessor //g') + pharmcat_vcf_preprocessor: \$(pharmcat_vcf_preprocessor.py -V 2>&1 | sed -e 's/PharmCAT VCF Preprocessor //g') END_VERSIONS """ @@ -50,8 +52,10 @@ process PHARMCAT { output: tuple val(group), val(meta), file("*.phenotype.json"), emit: pharmcat_pheno_json - tuple val(group), val(meta), file("*.match.json"), emit: pharmcat_macth_json + tuple val(group), val(meta), file("*.match.json"), emit: pharmcat_match_json + tuple val(group), val(meta), file("*.match.html"), emit: pharmcat_match_html tuple val(group), val(meta), file("*.report.html"), emit: pharmcat_report + tuple val(group), val(meta), file("*.report.json"), emit: pharmcat_report_json path "versions.yml", emit: versions when: @@ -61,7 +65,7 @@ process PHARMCAT { def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.group}" """ - /pharmcat/pharmcat -vcf $vcf --base-filename ${prefix}.pharmcat $args + pharmcat -vcf $vcf --base-filename ${prefix}.pharmcat $args cat <<-END_VERSIONS > versions.yml "${task.process}": @@ -74,6 +78,7 @@ process PHARMCAT { """ touch ${prefix}.pharmcat.phenotype.json touch ${prefix}.match.json + touch ${prefix}.match.html touch ${prefix}.report.html cat <<-END_VERSIONS > versions.yml diff --git a/resources/workflow_images/PGXModule_Workflow_v2.0.0.png b/resources/workflow_images/PGXModule_Workflow_v2.0.0.png new file mode 100755 index 0000000..27c2c24 Binary files /dev/null and b/resources/workflow_images/PGXModule_Workflow_v2.0.0.png differ diff --git a/resources/workflow_images/PGx.png b/resources/workflow_images/PGx.png deleted file mode 100755 index 86d764f..0000000 Binary files a/resources/workflow_images/PGx.png and /dev/null differ diff --git a/subworkflows/local/pharmacoGenomics.nf b/subworkflows/local/pharmacoGenomics.nf index 4b7acac..5be05bc 100755 --- a/subworkflows/local/pharmacoGenomics.nf +++ b/subworkflows/local/pharmacoGenomics.nf @@ -1,8 +1,9 @@ #!/usr/bin/env nextflow include { ONTARGET_BAM } from '../../modules/local/ontarget/main' -include { GATK_HAPLOTYPING } from '../../modules/local/ontarget/main' -include { SENTIEON_HAPLOTYPING } from '../../modules/local/ontarget/main' +include { ONTARGET_VCF } from '../../modules/local/ontarget/main' +include { GATK_HAPLOTYPING } from '../../modules/local/haplotyping/main' +include { SENTIEON_HAPLOTYPING } from '../../modules/local/haplotyping/main' include { BCFTOOLS_ANNOTATION } from '../../modules/local/annotation/main' include { VARIANT_FILTRATION } from '../../modules/local/filtration/main' include { PHARMCAT_PREPROCESSING } from '../../modules/local/pharmcat/main' @@ -35,46 +36,50 @@ workflow PHARMACO_GENOMICS { GET_PADDED_BAITS () ch_versions = ch_versions.mix(GET_PADDED_BAITS.out.versions) - // Get the on-target bam file - ONTARGET_BAM ( - bam_input, - PADDED_BED_INTERVALS.out.padded_bed_intervals - ) - ch_versions = ch_versions.mix(ONTARGET_BAM.out.versions) - // Runs only when the haplotype caller is GATK - GATK_HAPLOTYPING ( ONTARGET_BAM.out.bam_ontarget ) + GATK_HAPLOTYPING ( bam_input ) // Runs only when the haplotype caller is SENTIEON - SENTIEON_HAPLOTYPING ( ONTARGET_BAM.out.bam_ontarget ) + SENTIEON_HAPLOTYPING ( bam_input ) ch_haplotypes = Channel.empty().mix(GATK_HAPLOTYPING.out.haplotypes, SENTIEON_HAPLOTYPING.out.haplotypes) ch_versions = Channel.empty().mix(GATK_HAPLOTYPING.out.versions, SENTIEON_HAPLOTYPING.out.versions) - // Annotate the haplotypes - BCFTOOLS_ANNOTATION ( ch_haplotypes ) - ch_versions = ch_versions.mix(BCFTOOLS_ANNOTATION.out.versions) - // Filter the haplotypes - VARIANT_FILTRATION ( BCFTOOLS_ANNOTATION.out.annotations ) + VARIANT_FILTRATION ( ch_haplotypes ) ch_versions = ch_versions.mix(VARIANT_FILTRATION.out.versions) - + // Preprocess the pharmcat PHARMCAT_PREPROCESSING ( VARIANT_FILTRATION.out.haplotypes_filtered ) ch_versions = ch_versions.mix(PHARMCAT_PREPROCESSING.out.versions) // Run the pharmcat - PHARMCAT ( PHARMCAT_PREPROCESSING.out.pharmcat_preprocessed ) + PHARMCAT ( PHARMCAT_PREPROCESSING.out.pharmcat_preprocessed_vcf ) ch_versions = ch_versions.mix(PHARMCAT.out.versions) + // ONtarget Variants + ONTARGET_VCF ( VARIANT_FILTRATION.out.haplotypes_filtered, PADDED_BED_INTERVALS.out.padded_bed_intervals ) + ch_versions = ch_versions.mix(ONTARGET_VCF.out.versions) + + // Annotate the haplotypes + BCFTOOLS_ANNOTATION ( ONTARGET_VCF.out.vcf_ontarget ) + ch_versions = ch_versions.mix(BCFTOOLS_ANNOTATION.out.versions) + // Detect the variants from the target list - DETECTED_VARIANTS ( VARIANT_FILTRATION.out.haplotypes_filtered ) + DETECTED_VARIANTS ( BCFTOOLS_ANNOTATION.out.annotations ) ch_versions = ch_versions.mix(DETECTED_VARIANTS.out.versions) // Get the sample target list SAMPLE_TARGET_LIST ( DETECTED_VARIANTS.out.detected_tsv ) ch_versions = ch_versions.mix(SAMPLE_TARGET_LIST.out.versions) + // Get the on-target bam file + ONTARGET_BAM ( + bam_input, + PADDED_BED_INTERVALS.out.padded_bed_intervals + ) + ch_versions = ch_versions.mix(ONTARGET_BAM.out.versions) + // Get the depth of targets DEPTH_OF_TARGETS ( ONTARGET_BAM.out.bam_ontarget @@ -103,7 +108,7 @@ workflow PHARMACO_GENOMICS { // Get the PGx report GET_PGX_REPORT ( - VARIANT_FILTRATION.out.haplotypes_filtered + BCFTOOLS_ANNOTATION.out.annotations .join( DETECTED_VARIANTS.out.detected_tsv, by: [0,1]) .join( APPEND_ID_TO_GDF.out.depth_at_missing_annotate_gdf, by: [0,1] ) .join( GET_CLIINICAL_GUIDELINES.out.possible_diplotypes, by: [0,1] ) @@ -114,11 +119,14 @@ workflow PHARMACO_GENOMICS { emit: - pgx_report = GET_PGX_REPORT.out.pgx_html // channel: [ tuple val(group), val(meta) file("pgx.html") ] - pharmcat_preprocessed = PHARMCAT_PREPROCESSING.out.pharmcat_preprocessed // channel: [ tuple val(group), val(meta) file("pharmcat_preprocessed.vcf") ] - pharmcat_report = PHARMCAT.out.pharmcat_report // channel: [ tuple val(group), val(meta) file("pharmcat.html") ] - pharmcat_pheno_json = PHARMCAT.out.pharmcat_pheno_json // channel: [ tuple val(group), val(meta) file("pharmcat.phenotype.json") ] - pharmcat_macth_json = PHARMCAT.out.pharmcat_macth_json // channel: [ tuple val(group), val(meta) file("pharmcat.match.json") ] - targets_depth = GET_PGX_REPORT.out.targets_depth // channel: [ tuple val(group), val(meta) file(".targets.depth.tsv") ] - versions = ch_versions // channel: [ path(versions.yml) ] + pgx_report = GET_PGX_REPORT.out.pgx_html // channel: [ tuple val(group), val(meta) file("pgx.html") ] + // pharmcat_pgx_regions = PHARMCAT_PREPROCESSING.out.pharmcat_pgx_regions // channel: [ tuple val(group), val(meta) file("pharmcat_pgx_regions.vcf") ] + pharmcat_preprocessed = PHARMCAT_PREPROCESSING.out.pharmcat_preprocessed_vcf // channel: [ tuple val(group), val(meta) file("pharmcat_preprocessed.vcf") ] + pharmcat_report = PHARMCAT.out.pharmcat_report // channel: [ tuple val(group), val(meta) file("pharmcat.html") ] + pharmcat_pheno_json = PHARMCAT.out.pharmcat_pheno_json // channel: [ tuple val(group), val(meta) file("pharmcat.phenotype.json") ] + pharmcat_match_json = PHARMCAT.out.pharmcat_match_json // channel: [ tuple val(group), val(meta) file("pharmcat.match.json") ] + pharmcat_match_html = PHARMCAT.out.pharmcat_match_html // channel: [ tuple val(group), val(meta) file("pharmcat.match.html") ] + pharmcat_report_json = PHARMCAT.out.pharmcat_report_json // channel: [ tuple val(group), val(meta) file("pharmcat.report.json") ] + targets_depth = GET_PGX_REPORT.out.targets_depth // channel: [ tuple val(group), val(meta) file(".targets.depth.tsv") ] + versions = ch_versions // channel: [ path(versions.yml) ] } \ No newline at end of file diff --git a/workflows/pgx.nf b/workflows/pgx.nf index df0c600..8291e0e 100755 --- a/workflows/pgx.nf +++ b/workflows/pgx.nf @@ -62,10 +62,13 @@ workflow PGX { emit: report = PHARMACO_GENOMICS.out.pgx_report // channel: [ val(group), val(meta), file(pgx-report) ] + // pharmcat_pgx_regions = PHARMACO_GENOMICS.out.pharmcat_pgx_regions // channel: [ tuple val(group), val(meta) file("pharmcat_preprocessed.vcf") ] pharmcat_preprocessed = PHARMACO_GENOMICS.out.pharmcat_preprocessed // channel: [ tuple val(group), val(meta) file("pharmcat_preprocessed.vcf") ] pharmcat_report = PHARMACO_GENOMICS.out.pharmcat_report // channel: [ tuple val(group), val(meta) file("pharmcat.html") ] pharmcat_pheno_json = PHARMACO_GENOMICS.out.pharmcat_pheno_json // channel: [ tuple val(group), val(meta) file("pharmcat.phenotype.json") ] - pharmcat_macth_json = PHARMACO_GENOMICS.out.pharmcat_macth_json // channel: [ tuple val(group), val(meta) file("pharmcat.match.json") ] + pharmcat_match_json = PHARMACO_GENOMICS.out.pharmcat_match_json // channel: [ tuple val(group), val(meta) file("pharmcat.match.json") ] + pharmcat_match_html = PHARMACO_GENOMICS.out.pharmcat_match_html // channel: [ tuple val(group), val(meta) file("pharmcat.match.html") ] + pharmcat_report_json = PHARMACO_GENOMICS.out.pharmcat_report_json // channel: [ tuple val(group), val(meta) file("pharmcat.match.html") ] multiqc_report = multiqc_reports // channel: [ tuple val(group), val(meta) file("multiqc.html") ] multiqc_data = MULTIQC.out.data // channel: [ tuple val(group), val(meta) file("multiqc_data") ] multiqc_plots = MULTIQC.out.plots // channel: [ tuple val(group), val(meta) file("multiqc_plots") ]