Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update Simpleaf modules, subworkflow #424

Draft
wants to merge 17 commits into
base: dev
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ process {
}
withName: 'ANNDATA_BARCODES' {
ext.prefix = { "${meta.id}_${meta.input_type}_matrix" }
// ext.prefix = { "${meta.id}_filtered_matrix" }
publishDir = [
path: { "${params.outdir}/${params.aligner}/mtx_conversions/${meta.id}" },
mode: params.publish_dir_mode,
Expand Down Expand Up @@ -139,15 +140,16 @@ if (params.aligner == "alevin") {
mode: params.publish_dir_mode,
enabled: params.save_reference
]
ext.args = { "--rlen ${params.simpleaf_rlen}" }
// because piscem create a large number of intermediate files,
// we set scratch to true to avoid IO issues
scratch = true
}
withName: 'SIMPLEAF_QUANT' {
publishDir = [
path: { "${params.outdir}/${params.aligner}/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
ext.args = "-r cr-like"
}
// Fix for issue 196
// Modified for issue 334
Expand Down
Empty file added core.1739377
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably not be there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed!

Empty file.
2 changes: 1 addition & 1 deletion docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ This pipeline uses the simplified and flexible modules in [Simpleaf](https://sim

**Output directory: `results/reference_genome`**

- `salmon_index`
- `simpleaf_index`
- Contains the indexed reference transcriptome for the Salmon mapper
- `alevin/txp2gene.tsv`
- The transcriptome to gene mapping TSV file utilized by Alevin-fry
Expand Down
4 changes: 2 additions & 2 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ include { PIPELINE_COMPLETION } from './subworkflows/local/utils_nfcore_scrn
// Thus, manually provided files are not overwritten by the genome attributes
params.fasta = getGenomeAttribute('fasta')
params.gtf = getGenomeAttribute('gtf')
params.salmon_index = getGenomeAttribute('simpleaf')
params.txp2gene = getGenomeAttribute('simpleaf_tx2pgene')
params.simpleaf_index = getGenomeAttribute('simpleaf')
params.txp2gene = getGenomeAttribute('simpleaf_txp2gene')
params.cellranger_index = params.aligner == 'cellrangerarc' ?
getGenomeAttribute('cellrangerarc') :
getGenomeAttribute('cellranger')
Expand Down
20 changes: 11 additions & 9 deletions modules/local/alevinqc.nf
Original file line number Diff line number Diff line change
@@ -1,20 +1,22 @@
process ALEVINQC {

//
// This module executes alevinfry QC reporting tool on alevin results
// This module executes alevinfry QC reporting tool on alevin-fry results
//

tag "$meta.id"
label 'process_low'

//The alevinqc 1.14.0 container is broken, missing some libraries - thus reverting this to previous 1.12.1 version
conda "bioconda::bioconductor-alevinqc=1.12.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, this would also be a module on nf-core/modules. But if you don't have time right now, we can also address this at a later point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost "modules"-ready, but I don't have cycles to work on this in the following weeks. Let's do this at a later point.

conda "bioconda::bioconductor-alevinqc=1.18.0"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/bioconductor-alevinqc:1.12.1--r41h9f5acd7_0' :
'biocontainers/bioconductor-alevinqc:1.12.1--r41h9f5acd7_0' }"
'https://depot.galaxyproject.org/singularity/bioconductor-alevinqc:1.18.0--r43hf17093f_0' :
'biocontainers/bioconductor-alevinqc:1.18.0--r43hf17093f_0' }"

// all metas are the same
input:
tuple val(meta), path(alevin_results)
tuple val(meta), path(quant_dir, stageAs: "quant_dir")
tuple val(meta1), path(permit_dir, stageAs: "permit_dir")
tuple val(meta2), path(map_dir)

output:
tuple val(meta), path("alevin_report_${meta.id}.html"), emit: report
Expand All @@ -29,9 +31,9 @@ process ALEVINQC {
#!/usr/bin/env Rscript
require(alevinQC)
alevinFryQCReport(
mapDir = "${alevin_results}/af_map",
quantDir = "${alevin_results}/af_quant",
permitDir= "${alevin_results}/af_quant",
mapDir = "${map_dir}",
permitDir= "${permit_dir}",
quantDir = "${quant_dir}",
sampleId = "${prefix}",
outputFile = "alevin_report_${meta.id}.html",
outputFormat = "html_document",
Expand Down
2 changes: 2 additions & 0 deletions modules/local/mtx_to_h5ad.nf
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ process MTX_TO_H5AD {
script:
def aligner = (input_aligner in [ 'cellranger', 'cellrangerarc', 'cellrangermulti' ]) ? 'cellranger' : input_aligner

aligner = input_aligner == "alevin" ? "simpleaf" : aligner
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for backwards compatibility?
Can't we fix this at an earlier point in the pipeline, i.e. at the very beginning set aligner to simpleaf if it's alevin and then use simpleaf throughout the pipeline?

I'm afraid one needs to fix it at multiple locations otherwise.

Copy link
Contributor

@an-altosian an-altosian Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the file instead. Now the python script file is called mtx_to_h5ad_simpleaf.py. This is not for backward compatibility, but for matching the script name.

However, thanks for pointing this out. I will do some updates to ensure backward compatibility.

One related thing is, I am struggling with the documentation. There are a few confusions:

  1. alevin is the single-cell quantification module in salmon , not an "aligner".
  2. Simpleaf is the wrapper program of piscem + salmon + alevin-fry, and is the actual tool that is called in this pipeline.
  3. In latest simpleaf, the default indexer and mapper(aligner) is piscem, rather than salmon. Salmon will be used only if the --no-piscem argument is specified.
  4. Simpleaf uses alevin-fry, the successor of alevin, a stand-alone program written in rust, instead of the alevin module from salmon.

So, I am thinking of replacing "alevin" and "salmon" with "simpleaf" everywhere in the workflow, including file&folder names. This will avoid all confusions, but will change the file structure and the default "aligner" option. What do you think?


template "mtx_to_h5ad_${aligner}.py"

stub:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ def input_to_adata(

# input_type comes from NF module
input_to_adata(
input_data="${meta.id}_alevin_results/af_quant/alevin/",
input_data="${inputs}/alevin/",
output="${meta.id}_${meta.input_type}_matrix.h5ad",
sample="${meta.id}"
)
Expand Down
4 changes: 2 additions & 2 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@ params {
gtf = null

// alevin-fry parameters (simpleaf)
simpleaf_rlen = 91
simpleaf_index = null
barcode_whitelist = null
salmon_index = null
simpleaf_umi_resolution = "cr-like"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this parameter do? Is this a reasonable default, or should it be set based on the protocol used? We have a protocols.json file somewhere that already sets other parameters based on the protocol.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simpleaf requires UMI resolution argument to be set for resolving multi-mapped UMIs. It is independent with the protocol, but depends on how the users want to treat multimapping.

"cr-like", which is the current default in scrnaseq, says discarding all UMIs that can be assigned to multiple genes equally well. I suggest to expose this, but if you think setting a default is better, I can switch to that.


// kallisto bustools parameters
kallisto_index = null
Expand Down
13 changes: 7 additions & 6 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -163,9 +163,9 @@
"description": "",
"default": "",
"properties": {
"salmon_index": {
"simpleaf_index": {
"type": "string",
"description": "This can be used to specify a precomputed Salmon index in the pipeline, in order to skip the generation of required indices by Salmon itself.",
"description": "This can be used to specify a precomputed Simpleaf index, either generated by Salmon or Piscem, in the pipeline, in order to skip the generation of required indices by Simpleaf itself.",
"fa_icon": "fas fa-fish",
"format": "path",
"exists": true
Expand All @@ -177,10 +177,11 @@
"format": "file-path",
"exists": true
},
"simpleaf_rlen": {
"type": "integer",
"default": 91,
"description": "It is the target read length the index will be built for, using simpleaf.",
"simpleaf_umi_resolution": {
"type": "string",
"default": "cr-like",
"enum": ["cr-like", "cr-like-em", "parsimony", "parsimony-em", "parsimony-gene", "parsimony-gene-em"],
"description": "UMI resolution strategy to deduplicate UMIs.",
"fa_icon": "fas fa-map-marked-alt"
}
}
Expand Down
69 changes: 0 additions & 69 deletions subworkflows/local/alevin.nf

This file was deleted.

98 changes: 98 additions & 0 deletions subworkflows/local/simpleaf.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
/* -- IMPORT LOCAL MODULES/SUBWORKFLOWS -- */
include { ALEVINQC } from '../../modules/local/alevinqc'
include { SIMPLEAF_INDEX } from '../../../modules/modules/nf-core/simpleaf/index'
include { SIMPLEAF_QUANT } from '../../../modules/modules/nf-core/simpleaf/quant'

workflow SCRNASEQ_SIMPLEAF {

take:
ch_genome_fasta // channel
ch_genome_gtf // channel
transcript_fasta
simpleaf_index
txp2gene
barcode_whitelist
chemistry
resolution
ch_fastq // channel
map_dir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the map dir?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am preparing moving this subworkflow to nf-core/modules. simpleaf quant can take a folder that contains mapping results to skip indexing and mapping and directly jump into quantification. this map_dir says a folder containing mapping results.


main:
ch_versions = Channel.empty()

/*
* Build salmon index
*/
if ( !simpleaf_index || !map_dir ) {
// define input channels for index building
// we can either use the genome fasta and gtf files or the transcript fasta file
if ( transcript_fasta ) {
ch_genome_fasta_gtf = [ [:],[],[] ]
ch_transcript_fasta = Channel.of( [ [id: "${transcript_fasta.getBaseName()}"], transcript_fasta ] )
} else {
ch_genome_fasta_gtf = ch_genome_fasta.combine( ch_genome_gtf ).map{ fasta, gtf -> [[id: "${fasta.getBaseName()}"], fasta, gtf] }
ch_transcript_fasta = Channel.of( [ [:], [] ] )
}

SIMPLEAF_INDEX(
ch_genome_fasta_gtf,
ch_transcript_fasta
)
// Channel of tuple(meta, index dir)
simpleaf_index = SIMPLEAF_INDEX.out.index.collect()
// Channel of t2g path or empty
t2g = SIMPLEAF_INDEX.out.t2g.collect()
ch_versions = ch_versions.mix(SIMPLEAF_INDEX.out.versions)

// ensure txp2gene is a Channel
if (!txp2gene) {
txp2gene = t2g
} else {
txp2gene = Channel.of( txp2gene )
}
} else {
// ensure simpleaf index and txp2gene are Channels
simpleaf_index = Channel.of( [ [:], simpleaf_index ] )
txp2gene = Channel.of( txp2gene )
}

// define input channels for quantification
// we can either use the mapping results or the reads and index files
if ( map_dir ) {
ch_chemistry_reads = Channel.of( [ [:],[],[] ] )
ch_index_t2g = Channel.of( [ [:],[],[] ] )
ch_map_dir = Channel.of( [ [id: map_dir.baseName], map_dir ] )
} else {
ch_chemistry_reads = ch_fastq.map{ meta, files -> tuple(meta + ["chemistry": chemistry], chemistry, files) }
ch_index_t2g = simpleaf_index.combine( txp2gene )
ch_map_dir = [ [:],[] ]
}

/*
* Perform quantification with salmon alevin
*/
SIMPLEAF_QUANT (
ch_chemistry_reads,
ch_index_t2g,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be that a .collect() is needed here? Alternativley, maybe it could be avoided to build FIFO channels above by not using Channel.of at all and just keep everything as values.

Copy link
Contributor

@an-altosian an-altosian Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be that a .collect() is needed here?

Could you point out which line you referred to?

Alternativley, maybe it could be avoided to build FIFO channels above by not using Channel.of at all and just keep everything as values.

I was just making everything as channel for consistency. Keeping them as values sound good.

[[:], "unfiltered-pl", [], barcode_whitelist ],
resolution,
ch_map_dir
)
ch_versions = ch_versions.mix(SIMPLEAF_QUANT.out.versions)

ch_af_map = map_dir ? ch_map_dir : SIMPLEAF_QUANT.out.map
/*
* Run alevinQC
*/
ALEVINQC( SIMPLEAF_QUANT.out.quant, SIMPLEAF_QUANT.out.quant, ch_af_map )
ch_versions = ch_versions.mix(ALEVINQC.out.versions)


emit:
ch_versions
txp2gene
index = simpleaf_index
map = SIMPLEAF_QUANT.out.map
quant = SIMPLEAF_QUANT.out.quant
alevinqc = ALEVINQC.out.report
}
33 changes: 24 additions & 9 deletions workflows/scrnaseq.nf
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ include { methodsDescriptionText } from '../subworkfl
include { getGenomeAttribute } from '../subworkflows/local/utils_nfcore_scrnaseq_pipeline'
include { FASTQC_CHECK } from '../subworkflows/local/fastqc'
include { KALLISTO_BUSTOOLS } from '../subworkflows/local/kallisto_bustools'
include { SCRNASEQ_ALEVIN } from '../subworkflows/local/alevin'
include { SCRNASEQ_SIMPLEAF } from '../subworkflows/local/simpleaf'
include { STARSOLO } from '../subworkflows/local/starsolo'
include { CELLRANGER_ALIGN } from "../subworkflows/local/align_cellranger"
include { CELLRANGER_MULTI_ALIGN } from "../subworkflows/local/align_cellrangermulti"
Expand Down Expand Up @@ -64,7 +64,7 @@ workflow SCRNASEQ {
kb_t2c = params.kb_t2c ? file(params.kb_t2c, checkIfExists: true) : []

//salmon params
ch_salmon_index = params.salmon_index ? file(params.salmon_index, checkIfExists: true) : []
ch_simpleaf_index = params.simpleaf_index ? file(params.simpleaf_index, checkIfExists: true) : []

//star params
star_index = params.star_index ? file(params.star_index, checkIfExists: true) : null
Expand Down Expand Up @@ -135,19 +135,32 @@ workflow SCRNASEQ {

// Run salmon alevin pipeline
if (params.aligner == "alevin") {
SCRNASEQ_ALEVIN(

SCRNASEQ_SIMPLEAF(
ch_genome_fasta,
ch_filter_gtf,
ch_transcript_fasta,
ch_salmon_index,
ch_simpleaf_index,
ch_txp2gene,
ch_barcode_whitelist,
protocol_config['protocol'],
ch_fastq
params.simpleaf_umi_resolution,
ch_fastq,
[] // for existing map dir; not applicable
)
ch_versions = ch_versions.mix(SCRNASEQ_ALEVIN.out.ch_versions)
ch_multiqc_files = ch_multiqc_files.mix(SCRNASEQ_ALEVIN.out.alevin_results.map{ meta, it -> it })
ch_mtx_matrices = ch_mtx_matrices.mix( SCRNASEQ_ALEVIN.out.alevin_results )
ch_versions = ch_versions.mix(SCRNASEQ_SIMPLEAF.out.ch_versions)
ch_multiqc_files = ch_multiqc_files.mix(SCRNASEQ_SIMPLEAF.out.quant.map{ meta, it -> it })
ch_mtx_matrices = ch_mtx_matrices.mix(
SCRNASEQ_SIMPLEAF.out.quant.map{
meta, files -> [
meta +
[input_type: meta["filtered"] ? "filtered" : "raw" ],
files
]
}
)

ch_txp2gene = SCRNASEQ_SIMPLEAF.out.txp2gene
}

// Run STARSolo pipeline
Expand Down Expand Up @@ -284,7 +297,9 @@ workflow SCRNASEQ {
if ( !params.skip_cellbender && !(params.aligner in ['cellrangerarc']) ) {
// module should only run on the raw matrices thus, filter-out the filtered result of the aligners that can produce it
H5AD_REMOVEBACKGROUND_BARCODES_CELLBENDER_ANNDATA (
ch_h5ads.filter { meta, mtx_files -> meta.input_type == 'raw' }
ch_h5ads
.filter { meta, mtx_files -> meta.input_type == 'raw' }
.map { meta, mtx_files -> [ meta + [input_type: 'filtered'], mtx_files ]} // to avoid name collision
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you filter to raw and then set the input_type to filtered?

Copy link
Contributor

@an-altosian an-altosian Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the input to cellbender.

  1. cellbender filters cells and generates the "filtered" results.
  2. Both the input h5ad file and the output h5ad file from cellbender are named as "${meta.id}_${meta.input_type}.h5ad". For some reason, cellbender will not overwrite this existing file. Because I don't want to modify cellbender's module, I update input_type here to reflect the fact that the results are filtered

)
ch_h5ads = ch_h5ads.mix(
H5AD_REMOVEBACKGROUND_BARCODES_CELLBENDER_ANNDATA.out.h5ad
Expand Down
Loading