Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update Simpleaf modules, subworkflow #424

Draft
wants to merge 17 commits into
base: dev
Choose a base branch
from
Draft

Conversation

DongzeHE
Copy link
Member

Reopen #361 after updating simpleaf central modules. See this PR. I have tested using a 10x 500 dataset. Once the modules' PR is merged, we can start merging this PR

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/scrnaseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@DongzeHE DongzeHE requested a review from fmalmeida January 22, 2025 17:41
@nf-core-bot
Copy link
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.1.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@DongzeHE DongzeHE requested a review from grst January 22, 2025 17:43
Copy link
Member

@grst grst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor things

core.1739377 Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably not be there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed!

//

tag "$meta.id"
label 'process_low'

//The alevinqc 1.14.0 container is broken, missing some libraries - thus reverting this to previous 1.12.1 version
conda "bioconda::bioconductor-alevinqc=1.12.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, this would also be a module on nf-core/modules. But if you don't have time right now, we can also address this at a later point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost "modules"-ready, but I don't have cycles to work on this in the following weeks. Let's do this at a later point.

@@ -28,6 +28,8 @@ process MTX_TO_H5AD {
script:
def aligner = (input_aligner in [ 'cellranger', 'cellrangerarc', 'cellrangermulti' ]) ? 'cellranger' : input_aligner

aligner = input_aligner == "alevin" ? "simpleaf" : aligner
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for backwards compatibility?
Can't we fix this at an earlier point in the pipeline, i.e. at the very beginning set aligner to simpleaf if it's alevin and then use simpleaf throughout the pipeline?

I'm afraid one needs to fix it at multiple locations otherwise.

Copy link
Contributor

@an-altosian an-altosian Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the file instead. Now the python script file is called mtx_to_h5ad_simpleaf.py. This is not for backward compatibility, but for matching the script name.

However, thanks for pointing this out. I will do some updates to ensure backward compatibility.

One related thing is, I am struggling with the documentation. There are a few confusions:

  1. alevin is the single-cell quantification module in salmon , not an "aligner".
  2. Simpleaf is the wrapper program of piscem + salmon + alevin-fry, and is the actual tool that is called in this pipeline.
  3. In latest simpleaf, the default indexer and mapper(aligner) is piscem, rather than salmon. Salmon will be used only if the --no-piscem argument is specified.
  4. Simpleaf uses alevin-fry, the successor of alevin, a stand-alone program written in rust, instead of the alevin module from salmon.

So, I am thinking of replacing "alevin" and "salmon" with "simpleaf" everywhere in the workflow, including file&folder names. This will avoid all confusions, but will change the file structure and the default "aligner" option. What do you think?

barcode_whitelist = null
salmon_index = null
simpleaf_umi_resolution = "cr-like"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this parameter do? Is this a reasonable default, or should it be set based on the protocol used? We have a protocols.json file somewhere that already sets other parameters based on the protocol.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simpleaf requires UMI resolution argument to be set for resolving multi-mapped UMIs. It is independent with the protocol, but depends on how the users want to treat multimapping.

"cr-like", which is the current default in scrnaseq, says discarding all UMIs that can be assigned to multiple genes equally well. I suggest to expose this, but if you think setting a default is better, I can switch to that.

chemistry
resolution
ch_fastq // channel
map_dir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the map dir?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am preparing moving this subworkflow to nf-core/modules. simpleaf quant can take a folder that contains mapping results to skip indexing and mapping and directly jump into quantification. this map_dir says a folder containing mapping results.

ch_h5ads.filter { meta, mtx_files -> meta.input_type == 'raw' }
ch_h5ads
.filter { meta, mtx_files -> meta.input_type == 'raw' }
.map { meta, mtx_files -> [ meta + [input_type: 'filtered'], mtx_files ]} // to avoid name collision
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you filter to raw and then set the input_type to filtered?

Copy link
Contributor

@an-altosian an-altosian Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the input to cellbender.

  1. cellbender filters cells and generates the "filtered" results.
  2. Both the input h5ad file and the output h5ad file from cellbender are named as "${meta.id}_${meta.input_type}.h5ad". For some reason, cellbender will not overwrite this existing file. Because I don't want to modify cellbender's module, I update input_type here to reflect the fact that the results are filtered

*/
SIMPLEAF_QUANT (
ch_chemistry_reads,
ch_index_t2g,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be that a .collect() is needed here? Alternativley, maybe it could be avoided to build FIFO channels above by not using Channel.of at all and just keep everything as values.

Copy link
Contributor

@an-altosian an-altosian Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be that a .collect() is needed here?

Could you point out which line you referred to?

Alternativley, maybe it could be avoided to build FIFO channels above by not using Channel.of at all and just keep everything as values.

I was just making everything as channel for consistency. Keeping them as values sound good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants