-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update Simpleaf modules, subworkflow #424
base: dev
Are you sure you want to change the base?
Conversation
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.1.1. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor things
core.1739377
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should probably not be there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed!
// | ||
|
||
tag "$meta.id" | ||
label 'process_low' | ||
|
||
//The alevinqc 1.14.0 container is broken, missing some libraries - thus reverting this to previous 1.12.1 version | ||
conda "bioconda::bioconductor-alevinqc=1.12.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, this would also be a module on nf-core/modules. But if you don't have time right now, we can also address this at a later point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost "modules"-ready, but I don't have cycles to work on this in the following weeks. Let's do this at a later point.
modules/local/mtx_to_h5ad.nf
Outdated
@@ -28,6 +28,8 @@ process MTX_TO_H5AD { | |||
script: | |||
def aligner = (input_aligner in [ 'cellranger', 'cellrangerarc', 'cellrangermulti' ]) ? 'cellranger' : input_aligner | |||
|
|||
aligner = input_aligner == "alevin" ? "simpleaf" : aligner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this for backwards compatibility?
Can't we fix this at an earlier point in the pipeline, i.e. at the very beginning set aligner to simpleaf
if it's alevin
and then use simpleaf
throughout the pipeline?
I'm afraid one needs to fix it at multiple locations otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed the file instead. Now the python script file is called mtx_to_h5ad_simpleaf.py
. This is not for backward compatibility, but for matching the script name.
However, thanks for pointing this out. I will do some updates to ensure backward compatibility.
One related thing is, I am struggling with the documentation. There are a few confusions:
- alevin is the single-cell quantification module in salmon , not an "aligner".
- Simpleaf is the wrapper program of piscem + salmon + alevin-fry, and is the actual tool that is called in this pipeline.
- In latest simpleaf, the default indexer and mapper(aligner) is piscem, rather than salmon. Salmon will be used only if the
--no-piscem
argument is specified. - Simpleaf uses alevin-fry, the successor of alevin, a stand-alone program written in rust, instead of the alevin module from salmon.
So, I am thinking of replacing "alevin" and "salmon" with "simpleaf" everywhere in the workflow, including file&folder names. This will avoid all confusions, but will change the file structure and the default "aligner" option. What do you think?
barcode_whitelist = null | ||
salmon_index = null | ||
simpleaf_umi_resolution = "cr-like" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this parameter do? Is this a reasonable default, or should it be set based on the protocol used? We have a protocols.json file somewhere that already sets other parameters based on the protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simpleaf requires UMI resolution argument to be set for resolving multi-mapped UMIs. It is independent with the protocol, but depends on how the users want to treat multimapping.
"cr-like", which is the current default in scrnaseq, says discarding all UMIs that can be assigned to multiple genes equally well. I suggest to expose this, but if you think setting a default is better, I can switch to that.
chemistry | ||
resolution | ||
ch_fastq // channel | ||
map_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the map dir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am preparing moving this subworkflow to nf-core/modules. simpleaf quant can take a folder that contains mapping results to skip indexing and mapping and directly jump into quantification. this map_dir
says a folder containing mapping results.
ch_h5ads.filter { meta, mtx_files -> meta.input_type == 'raw' } | ||
ch_h5ads | ||
.filter { meta, mtx_files -> meta.input_type == 'raw' } | ||
.map { meta, mtx_files -> [ meta + [input_type: 'filtered'], mtx_files ]} // to avoid name collision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you filter to raw and then set the input_type to filtered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the input to cellbender.
- cellbender filters cells and generates the "filtered" results.
- Both the input h5ad file and the output h5ad file from cellbender are named as
"${meta.id}_${meta.input_type}.h5ad"
. For some reason, cellbender will not overwrite this existing file. Because I don't want to modify cellbender's module, I updateinput_type
here to reflect the fact that the results are filtered
*/ | ||
SIMPLEAF_QUANT ( | ||
ch_chemistry_reads, | ||
ch_index_t2g, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be that a .collect()
is needed here? Alternativley, maybe it could be avoided to build FIFO channels above by not using Channel.of at all and just keep everything as values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be that a .collect() is needed here?
Could you point out which line you referred to?
Alternativley, maybe it could be avoided to build FIFO channels above by not using Channel.of at all and just keep everything as values.
I was just making everything as channel for consistency. Keeping them as values sound good.
Reopen #361 after updating simpleaf central modules. See this PR. I have tested using a 10x 500 dataset. Once the modules' PR is merged, we can start merging this PR
PR checklist
nf-core pipelines lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).