-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update Simpleaf modules, subworkflow #424
base: dev
Are you sure you want to change the base?
Changes from 5 commits
fec1962
c6b87c4
f3e2977
f5863df
90cc34e
bec0bfa
bc5df5e
0e4c242
f82884b
2513ef0
6f3834b
afcc3a9
afd4d5b
2bd54ea
7301865
5ad1727
1b6dc9b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,20 +1,22 @@ | ||
process ALEVINQC { | ||
|
||
// | ||
// This module executes alevinfry QC reporting tool on alevin results | ||
// This module executes alevinfry QC reporting tool on alevin-fry results | ||
// | ||
|
||
tag "$meta.id" | ||
label 'process_low' | ||
|
||
//The alevinqc 1.14.0 container is broken, missing some libraries - thus reverting this to previous 1.12.1 version | ||
conda "bioconda::bioconductor-alevinqc=1.12.1" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ideally, this would also be a module on nf-core/modules. But if you don't have time right now, we can also address this at a later point. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is almost "modules"-ready, but I don't have cycles to work on this in the following weeks. Let's do this at a later point. |
||
conda "bioconda::bioconductor-alevinqc=1.18.0" | ||
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | ||
'https://depot.galaxyproject.org/singularity/bioconductor-alevinqc:1.12.1--r41h9f5acd7_0' : | ||
'biocontainers/bioconductor-alevinqc:1.12.1--r41h9f5acd7_0' }" | ||
'https://depot.galaxyproject.org/singularity/bioconductor-alevinqc:1.18.0--r43hf17093f_0' : | ||
'biocontainers/bioconductor-alevinqc:1.18.0--r43hf17093f_0' }" | ||
|
||
// all metas are the same | ||
input: | ||
tuple val(meta), path(alevin_results) | ||
tuple val(meta), path(quant_dir, stageAs: "quant_dir") | ||
tuple val(meta1), path(permit_dir, stageAs: "permit_dir") | ||
tuple val(meta2), path(map_dir) | ||
|
||
output: | ||
tuple val(meta), path("alevin_report_${meta.id}.html"), emit: report | ||
|
@@ -29,9 +31,9 @@ process ALEVINQC { | |
#!/usr/bin/env Rscript | ||
require(alevinQC) | ||
alevinFryQCReport( | ||
mapDir = "${alevin_results}/af_map", | ||
quantDir = "${alevin_results}/af_quant", | ||
permitDir= "${alevin_results}/af_quant", | ||
mapDir = "${map_dir}", | ||
permitDir= "${permit_dir}", | ||
quantDir = "${quant_dir}", | ||
sampleId = "${prefix}", | ||
outputFile = "alevin_report_${meta.id}.html", | ||
outputFormat = "html_document", | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,6 +28,8 @@ process MTX_TO_H5AD { | |
script: | ||
def aligner = (input_aligner in [ 'cellranger', 'cellrangerarc', 'cellrangermulti' ]) ? 'cellranger' : input_aligner | ||
|
||
aligner = input_aligner == "alevin" ? "simpleaf" : aligner | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this for backwards compatibility? I'm afraid one needs to fix it at multiple locations otherwise. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I renamed the file instead. Now the python script file is called However, thanks for pointing this out. I will do some updates to ensure backward compatibility. One related thing is, I am struggling with the documentation. There are a few confusions:
So, I am thinking of replacing "alevin" and "salmon" with "simpleaf" everywhere in the workflow, including file&folder names. This will avoid all confusions, but will change the file structure and the default "aligner" option. What do you think? |
||
|
||
template "mtx_to_h5ad_${aligner}.py" | ||
|
||
stub: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,9 +24,9 @@ params { | |
gtf = null | ||
|
||
// alevin-fry parameters (simpleaf) | ||
simpleaf_rlen = 91 | ||
simpleaf_index = null | ||
barcode_whitelist = null | ||
salmon_index = null | ||
simpleaf_umi_resolution = "cr-like" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does this parameter do? Is this a reasonable default, or should it be set based on the protocol used? We have a protocols.json file somewhere that already sets other parameters based on the protocol. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. simpleaf requires UMI resolution argument to be set for resolving multi-mapped UMIs. It is independent with the protocol, but depends on how the users want to treat multimapping. "cr-like", which is the current default in scrnaseq, says discarding all UMIs that can be assigned to multiple genes equally well. I suggest to expose this, but if you think setting a default is better, I can switch to that. |
||
|
||
// kallisto bustools parameters | ||
kallisto_index = null | ||
|
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
/* -- IMPORT LOCAL MODULES/SUBWORKFLOWS -- */ | ||
include { ALEVINQC } from '../../modules/local/alevinqc' | ||
include { SIMPLEAF_INDEX } from '../../../modules/modules/nf-core/simpleaf/index' | ||
include { SIMPLEAF_QUANT } from '../../../modules/modules/nf-core/simpleaf/quant' | ||
|
||
workflow SCRNASEQ_SIMPLEAF { | ||
|
||
take: | ||
ch_genome_fasta // channel | ||
ch_genome_gtf // channel | ||
transcript_fasta | ||
simpleaf_index | ||
txp2gene | ||
barcode_whitelist | ||
chemistry | ||
resolution | ||
ch_fastq // channel | ||
map_dir | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what's the map dir? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am preparing moving this subworkflow to nf-core/modules. simpleaf quant can take a folder that contains mapping results to skip indexing and mapping and directly jump into quantification. this |
||
|
||
main: | ||
ch_versions = Channel.empty() | ||
|
||
/* | ||
* Build salmon index | ||
*/ | ||
if ( !simpleaf_index || !map_dir ) { | ||
// define input channels for index building | ||
// we can either use the genome fasta and gtf files or the transcript fasta file | ||
if ( transcript_fasta ) { | ||
ch_genome_fasta_gtf = [ [:],[],[] ] | ||
ch_transcript_fasta = Channel.of( [ [id: "${transcript_fasta.getBaseName()}"], transcript_fasta ] ) | ||
} else { | ||
ch_genome_fasta_gtf = ch_genome_fasta.combine( ch_genome_gtf ).map{ fasta, gtf -> [[id: "${fasta.getBaseName()}"], fasta, gtf] } | ||
ch_transcript_fasta = Channel.of( [ [:], [] ] ) | ||
} | ||
|
||
SIMPLEAF_INDEX( | ||
ch_genome_fasta_gtf, | ||
ch_transcript_fasta | ||
) | ||
// Channel of tuple(meta, index dir) | ||
simpleaf_index = SIMPLEAF_INDEX.out.index.collect() | ||
// Channel of t2g path or empty | ||
t2g = SIMPLEAF_INDEX.out.t2g.collect() | ||
ch_versions = ch_versions.mix(SIMPLEAF_INDEX.out.versions) | ||
|
||
// ensure txp2gene is a Channel | ||
if (!txp2gene) { | ||
txp2gene = t2g | ||
} else { | ||
txp2gene = Channel.of( txp2gene ) | ||
} | ||
} else { | ||
// ensure simpleaf index and txp2gene are Channels | ||
simpleaf_index = Channel.of( [ [:], simpleaf_index ] ) | ||
txp2gene = Channel.of( txp2gene ) | ||
} | ||
|
||
// define input channels for quantification | ||
// we can either use the mapping results or the reads and index files | ||
if ( map_dir ) { | ||
ch_chemistry_reads = Channel.of( [ [:],[],[] ] ) | ||
ch_index_t2g = Channel.of( [ [:],[],[] ] ) | ||
ch_map_dir = Channel.of( [ [id: map_dir.baseName], map_dir ] ) | ||
} else { | ||
ch_chemistry_reads = ch_fastq.map{ meta, files -> tuple(meta + ["chemistry": chemistry], chemistry, files) } | ||
ch_index_t2g = simpleaf_index.combine( txp2gene ) | ||
ch_map_dir = [ [:],[] ] | ||
} | ||
|
||
/* | ||
* Perform quantification with salmon alevin | ||
*/ | ||
SIMPLEAF_QUANT ( | ||
ch_chemistry_reads, | ||
ch_index_t2g, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could it be that a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Could you point out which line you referred to?
I was just making everything as channel for consistency. Keeping them as values sound good. |
||
[[:], "unfiltered-pl", [], barcode_whitelist ], | ||
resolution, | ||
ch_map_dir | ||
) | ||
ch_versions = ch_versions.mix(SIMPLEAF_QUANT.out.versions) | ||
|
||
ch_af_map = map_dir ? ch_map_dir : SIMPLEAF_QUANT.out.map | ||
/* | ||
* Run alevinQC | ||
*/ | ||
ALEVINQC( SIMPLEAF_QUANT.out.quant, SIMPLEAF_QUANT.out.quant, ch_af_map ) | ||
ch_versions = ch_versions.mix(ALEVINQC.out.versions) | ||
|
||
|
||
emit: | ||
ch_versions | ||
txp2gene | ||
index = simpleaf_index | ||
map = SIMPLEAF_QUANT.out.map | ||
quant = SIMPLEAF_QUANT.out.quant | ||
alevinqc = ALEVINQC.out.report | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,7 +11,7 @@ include { methodsDescriptionText } from '../subworkfl | |
include { getGenomeAttribute } from '../subworkflows/local/utils_nfcore_scrnaseq_pipeline' | ||
include { FASTQC_CHECK } from '../subworkflows/local/fastqc' | ||
include { KALLISTO_BUSTOOLS } from '../subworkflows/local/kallisto_bustools' | ||
include { SCRNASEQ_ALEVIN } from '../subworkflows/local/alevin' | ||
include { SCRNASEQ_SIMPLEAF } from '../subworkflows/local/simpleaf' | ||
include { STARSOLO } from '../subworkflows/local/starsolo' | ||
include { CELLRANGER_ALIGN } from "../subworkflows/local/align_cellranger" | ||
include { CELLRANGER_MULTI_ALIGN } from "../subworkflows/local/align_cellrangermulti" | ||
|
@@ -64,7 +64,7 @@ workflow SCRNASEQ { | |
kb_t2c = params.kb_t2c ? file(params.kb_t2c, checkIfExists: true) : [] | ||
|
||
//salmon params | ||
ch_salmon_index = params.salmon_index ? file(params.salmon_index, checkIfExists: true) : [] | ||
ch_simpleaf_index = params.simpleaf_index ? file(params.simpleaf_index, checkIfExists: true) : [] | ||
|
||
//star params | ||
star_index = params.star_index ? file(params.star_index, checkIfExists: true) : null | ||
|
@@ -135,19 +135,32 @@ workflow SCRNASEQ { | |
|
||
// Run salmon alevin pipeline | ||
if (params.aligner == "alevin") { | ||
SCRNASEQ_ALEVIN( | ||
|
||
SCRNASEQ_SIMPLEAF( | ||
ch_genome_fasta, | ||
ch_filter_gtf, | ||
ch_transcript_fasta, | ||
ch_salmon_index, | ||
ch_simpleaf_index, | ||
ch_txp2gene, | ||
ch_barcode_whitelist, | ||
protocol_config['protocol'], | ||
ch_fastq | ||
params.simpleaf_umi_resolution, | ||
ch_fastq, | ||
[] // for existing map dir; not applicable | ||
) | ||
ch_versions = ch_versions.mix(SCRNASEQ_ALEVIN.out.ch_versions) | ||
ch_multiqc_files = ch_multiqc_files.mix(SCRNASEQ_ALEVIN.out.alevin_results.map{ meta, it -> it }) | ||
ch_mtx_matrices = ch_mtx_matrices.mix( SCRNASEQ_ALEVIN.out.alevin_results ) | ||
ch_versions = ch_versions.mix(SCRNASEQ_SIMPLEAF.out.ch_versions) | ||
ch_multiqc_files = ch_multiqc_files.mix(SCRNASEQ_SIMPLEAF.out.quant.map{ meta, it -> it }) | ||
ch_mtx_matrices = ch_mtx_matrices.mix( | ||
SCRNASEQ_SIMPLEAF.out.quant.map{ | ||
meta, files -> [ | ||
meta + | ||
[input_type: meta["filtered"] ? "filtered" : "raw" ], | ||
files | ||
] | ||
} | ||
) | ||
|
||
ch_txp2gene = SCRNASEQ_SIMPLEAF.out.txp2gene | ||
} | ||
|
||
// Run STARSolo pipeline | ||
|
@@ -284,7 +297,9 @@ workflow SCRNASEQ { | |
if ( !params.skip_cellbender && !(params.aligner in ['cellrangerarc']) ) { | ||
// module should only run on the raw matrices thus, filter-out the filtered result of the aligners that can produce it | ||
H5AD_REMOVEBACKGROUND_BARCODES_CELLBENDER_ANNDATA ( | ||
ch_h5ads.filter { meta, mtx_files -> meta.input_type == 'raw' } | ||
ch_h5ads | ||
.filter { meta, mtx_files -> meta.input_type == 'raw' } | ||
.map { meta, mtx_files -> [ meta + [input_type: 'filtered'], mtx_files ]} // to avoid name collision | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do you filter to raw and then set the input_type to filtered? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the input to cellbender.
|
||
) | ||
ch_h5ads = ch_h5ads.mix( | ||
H5AD_REMOVEBACKGROUND_BARCODES_CELLBENDER_ANNDATA.out.h5ad | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should probably not be there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed!