Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update Simpleaf modules, subworkflow #424

Open
wants to merge 43 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
fec1962
update simpleaf subworkflow
an-altosian Jan 20, 2025
c6b87c4
adopt new simpleaf modules
an-altosian Jan 21, 2025
f3e2977
tested changes
an-altosian Jan 22, 2025
f5863df
Merge pull request #1 from an-altosian/dev
DongzeHE Jan 22, 2025
90cc34e
Merge branch 'dev' into dev
grst Jan 22, 2025
bec0bfa
adopt new t2g format in simpleaf index out
an-altosian Jan 23, 2025
bc5df5e
Merge branch 'DongzeHE:dev' into dev
an-altosian Jan 23, 2025
0e4c242
fix typos
an-altosian Jan 23, 2025
f82884b
Merge branch 'dev' of https://github.com/an-altosian/scrnaseq into dev
an-altosian Jan 23, 2025
2513ef0
fix typos
an-altosian Jan 23, 2025
6f3834b
update doc
an-altosian Jan 27, 2025
afcc3a9
avoid using channel.of
an-altosian Jan 31, 2025
afd4d5b
Merge pull request #2 from an-altosian/dev
DongzeHE Jan 31, 2025
2bd54ea
back compatibility
an-altosian Jan 31, 2025
7301865
Merge branch 'dev' of https://github.com/DongzeHE/scrnaseq into dev
an-altosian Jan 31, 2025
5ad1727
rewrite mtx_to_h5ad_simpleaf to be aware of USA mode
an-altosian Jan 31, 2025
1b6dc9b
rewrite mtx_to_h5ad_simpleaf to be aware of USA mode
an-altosian Jan 31, 2025
9bc8e03
update doc
an-altosian Feb 5, 2025
cc27d13
fix bug
an-altosian Feb 6, 2025
7ac2aa4
lint
an-altosian Feb 6, 2025
179adf8
Merge branch 'nf-core:dev' into dev
an-altosian Feb 6, 2025
025c771
prettier
an-altosian Feb 8, 2025
3db96cb
Merge branch 'dev' of https://github.com/DongzeHE/scrnaseq into dev
an-altosian Feb 8, 2025
8b9c5a6
switch module dir
an-altosian Feb 8, 2025
65ff578
manually update nf-core-scrnaseq_logo_light.png
an-altosian Feb 8, 2025
bc7ea31
Merge branch 'dev' into dev
grst Feb 10, 2025
ccb5c87
Update conf/modules.config
an-altosian Feb 10, 2025
960db59
use gene id as var_name in h5ad
an-altosian Feb 11, 2025
66949a6
Merge branch 'dev' of https://github.com/DongzeHE/scrnaseq into dev
an-altosian Feb 11, 2025
a8fb237
remove local simpleaf modules
an-altosian Feb 11, 2025
d212f43
Merge branch 'dev' into dev
an-altosian Feb 11, 2025
ef240e2
addess commenets
an-altosian Feb 11, 2025
bf8ee23
Merge branch 'dev' of https://github.com/DongzeHE/scrnaseq into dev
an-altosian Feb 11, 2025
9ddeb87
fix bug for existing index dir
an-altosian Feb 11, 2025
cbdb32b
minor typos
an-altosian Feb 11, 2025
0ba727e
make sure txp2gene is channel
an-altosian Feb 11, 2025
745e9f6
make sure txp2gene is channel
an-altosian Feb 12, 2025
525e866
comprehensive testign
an-altosian Feb 12, 2025
d0444fb
comprehensive testign
an-altosian Feb 12, 2025
d97edc1
comprehensive testign
an-altosian Feb 12, 2025
1a34680
test if single thread passes testing
an-altosian Feb 13, 2025
56cc677
test nextflow version 24.10.2
an-altosian Feb 14, 2025
b0739c3
roll back to 24.04.02 for ci test
an-altosian Feb 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/awsfulltest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
aligner: ["alevin", "kallisto", "star", "cellranger"]
aligner: ["simpleaf", "kallisto", "star", "cellranger"]
steps:
- name: Get PR reviews
uses: octokit/[email protected]
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/awstest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
aligner: ["alevin", "kallisto", "star", "cellranger"]
aligner: ["simpleaf", "kallisto", "star", "cellranger"]
steps:
# Launch workflow using Seqera Platform CLI tool action
- name: Launch workflow via Seqera Platform
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ jobs:
NXF_VER:
- "24.04.2"
- "latest-everything"
profile: ["alevin", "cellranger", "cellrangermulti", "kallisto", "star"]
profile: ["simpleaf", "cellranger", "cellrangermulti", "kallisto", "star"]

steps:
- name: Disk space cleanup
Expand Down
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Removed `--kb_filter` parameter. Kallisto filtering is triggered by default and can be turned off with `ext.args` ([#421](https://github.com/nf-core/scrnaseq/issues/421))
- Document better that `cellbender` is used for empty drops calling and not the `emptydrops` method (([#420](https://github.com/nf-core/scrnaseq/issues/420)))
- Add `--limitBAMsortRAM` to STARsolo alignment, to make sure BAM sorting memory scales with the task memory ([#430](https://github.com/nf-core/scrnaseq/pull/430))
- Replace local modules for simpleaf, SIMPLEAF_INDEX and SIMPLEAF_QUANT, with their central modules from nf-core/modules, and update simpleaf subworkflows accordingly.
- Update documents related to `simpleaf`, `alevin`, `salmon`, and `alevin-fry` for consistency.
- Rename the default aligner from `alevin` to `simpleaf` for consistency.
- Update the `mtx_to_h5ad` template for `simpleaf` to start from the h5ad file generated by simpleaf.

## v3.0.0 - 2024-12-09

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@

This is a community effort in building a pipeline capable to support:

- Alevin-Fry + AlevinQC
- SimpleAF(Alevin-Fry) + AlevinQC
- STARSolo
- Kallisto + BUStools
- Cellranger
Expand Down Expand Up @@ -65,7 +65,7 @@ nextflow run nf-core/scrnaseq \
--fasta GRCm38.p6.genome.chr19.fa \
--gtf gencode.vM19.annotation.chr19.gtf \
--protocol 10XV2 \
--aligner <alevin/kallisto/star/cellranger> \
--aligner <simpleaf/kallisto/star/cellranger> \
--outdir <OUTDIR>
```

Expand Down
2 changes: 1 addition & 1 deletion assets/protocols.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"alevin": {
"simpleaf": {
"10XV1": {
"protocol": "10xv1",
"whitelist": "assets/whitelist/10x_V1_barcode_whitelist.txt.gz"
Expand Down
2 changes: 1 addition & 1 deletion bin/alevin_qc.r
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,6 @@ sampleId <- args[2]
outDir <- args[3]

alevinQCReport(baseDir = baseDir, sampleId = sampleId,
outputFile = "alevinReport.html",
outputFile = "simpleafQCReport.html",
outputFormat = "html_document",
outputDir = outDir, forceOverwrite = TRUE)
16 changes: 9 additions & 7 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -127,31 +127,33 @@ if(params.aligner == "cellrangerarc") {
}
}

if (params.aligner == "alevin") {
if (params.aligner == "simpleaf" || params.aligner == "alevin") {
fmalmeida marked this conversation as resolved.
Show resolved Hide resolved
process {
withName: GFFREAD_TXP2GENE {
ext.args = "--table transcript_id,gene_id"
ext.prefix = { "${gff.baseName}_gffread" }
}
withName: 'SIMPLEAF_INDEX' {
publishDir = [
path: { "${params.outdir}/${params.aligner}" },
mode: params.publish_dir_mode,
enabled: params.save_reference
]
ext.args = { "--rlen ${params.simpleaf_rlen}" }
ext.prefix = { "simpleaf_index" }

}
withName: 'SIMPLEAF_QUANT' {
publishDir = [
path: { "${params.outdir}/${params.aligner}/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
ext.args = "-r cr-like"
ext.prefix = { "simpleaf_quant" }

}
// Fix for issue 196
// Modified for issue 334
withName: 'ALEVINQC' {
publishDir = [
path: { "${params.outdir}/${params.aligner}/${meta.id}" },
mode: params.publish_dir_mode,
]
time = { 120.h }
}
}
Expand Down
41 changes: 20 additions & 21 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [FastQC](#fastqc)
- [Kallisto \& Bustools Results](#kallisto--bustools-results)
- [STARsolo](#starsolo)
- [Salmon \& Alevin-fry \& AlevinQC](#salmon--alevin-fry--alevinqc)
- [Simpleaf \& AlevinQC](#simpleaf--alevinqc)
- [Cellranger](#cellranger)
- [Cellranger ARC](#cellranger-arc)
- [Cellranger multi](#cellranger-multi)
Expand Down Expand Up @@ -80,23 +80,22 @@ For details on how to load these into R and perform further downstream analysis,
- `star_index`
- Contains the index of the supplied genome fasta file

## Salmon & Alevin-fry & AlevinQC
## Simpleaf & AlevinQC

This pipeline uses the simplified and flexible modules in [Simpleaf](https://simpleaf.readthedocs.io/en/latest/) for processing single-cell data with [Salmon](https://salmon.readthedocs.io/en/latest/) as the underlying mapper and [Alevin-fry](https://alevin-fry.readthedocs.io/en/latest/) as the quantification tool. For detailed examples of using the quantification results generated by Alevin-fry in downstream analyses, such as RNA-velocity, please refer to [Alevin-fry/simpleaf tutorials](https://combine-lab.github.io/alevin-fry-tutorials/#blog).
This pipeline uses the simplified and flexible modules in [Simpleaf](https://simpleaf.readthedocs.io/en/latest/) for processing single-cell data with [Salmon](https://salmon.readthedocs.io/en/latest/) or [Piscem](https://github.com/COMBINE-lab/piscem) as the underlying mapper and [Alevin-fry](https://alevin-fry.readthedocs.io/en/latest/) as the quantification tool. For detailed examples of using the quantification results generated by Alevin-fry in downstream analyses, such as RNA-velocity, please refer to [Alevin-fry/simpleaf tutorials](https://combine-lab.github.io/alevin-fry-tutorials/#blog).

**Output directory: `results/alevin`**
**Output directory: `results/simpleaf`**

- `alevin`
- Contains the count matrix created by Alevin-fry
- `alevinqc`
- Contains the QC report for the aforementioned Alevin-fry output data

**Output directory: `results/reference_genome`**

- `salmon_index`
- Contains the indexed reference transcriptome for the Salmon mapper
- `alevin/txp2gene.tsv`
- The transcriptome to gene mapping TSV file utilized by Alevin-fry
- `${meta.id}/simpleaf_quant/af_quant/alevin`
- Contains the quantification results -- the count matrix -- generated by simpleaf for each sample with.
- `${meta.id}/simpleaf_quant/af_quant`
- Contains the logs and other intermediate results generated during the quantification stage.
- `${meta.id}/simpleaf_quant/af_map`
- Contains the logs and other intermediate results generated during the mapping stage.
- `${meta.id}/simpleaf_qc_report_${meta.id}.html`
- The QC report generated by AlevinQC for each sample.
- `simpleaf_index`
- Contains the indexed reference transcriptome generated by Simpleaf. This folder will be generated if the index was generated by the pipeline and the `save_reference` parameter is set to `true`. Notice that, because the simpleaf index generated from a specific reference is fixed, this folder can be passed to the pipeline via the `simpleaf_index` parameter, or save it in `igenome.config` under `genomes[ params.genome ][ "simpleaf" ]`, to avoid re-indexing the reference.

## Cellranger

Expand Down Expand Up @@ -159,13 +158,13 @@ So, to try to avoid confusion by the user, we added "suffixes" to the generated

So, the conversion modules generate data with the following syntax: **`*_{raw,filtered,cellbender_filter}_matrix.{h5ad,rds}`**. With the following meanings:

| suffix | meaning |
| :---------------- | :--------------------------------------------------------------------------------------------------------------------------------------- |
| raw | Conversion of the raw/unprocessed matrix generated by the tool. It is also used for tools that generate only one matrix, such as alevin. |
| filtered | Conversion of the filtered/processed matrix generated by the tool |
| cellbender_filter | Conversion of the matrix that was generated by the cellbender remove background filtering module |
| suffix | meaning |
| :---------------- | :----------------------------------------------------------------------------------------------------------------------------------------- |
| raw | Conversion of the raw/unprocessed matrix generated by the tool. It is also used for tools that generate only one matrix, such as simpleaf. |
| filtered | Conversion of the filtered/processed matrix generated by the tool |
| cellbender_filter | Conversion of the matrix that was generated by the cellbender remove background filtering module |

> Some aligners, like `alevin` do not produce both raw&filtered matrices. When aligners give only one output, they are treated with the `raw` suffix.
> Some aligners, like `simpleaf` do not produce both raw&filtered matrices. When aligners give only one output, they are treated with the `raw` suffix.
> Some aligners may have an option to give both raw&filtered and only one, like `kallisto`. Be aware when using the tools.

## MultiQC
Expand Down
12 changes: 8 additions & 4 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Note that since cellranger v7, it is **not recommended** anymore to supply the `

## Aligning options

By default (i.e. `--aligner alevin`), the pipeline uses [Salmon](https://salmon.readthedocs.io/en/latest/) to perform pseudo-alignment of reads to the reference genome and [Alevin-fry](https://alevin-fry.readthedocs.io/en/latest/) to perform the downstream BAM-level quantification. Then QC reports are generated with AlevinQC.
By default (i.e. `--aligner simpleaf`), the pipeline uses [piscem](https://github.com/COMBINE-lab/piscem) to perform pseudo-alignment of reads to the reference genome and [Alevin-fry](https://alevin-fry.readthedocs.io/en/latest/) to perform the downstream BAM-level quantification. Then QC reports are generated with [AlevinQC](https://github.com/csoneson/alevinQC).

Other aligner options for running the pipeline are:

Expand Down Expand Up @@ -97,11 +97,11 @@ The command `kb --list` shows all supported, preconfigured protocols. Additional

For more details, please refer to the [Kallisto/bustools documentation](https://pachterlab.github.io/kallisto/manual#bus).

#### Alevin-fry
#### Simpleaf

Simpleaf has the ability to pass custom chemistries to Alevin-fry, in a slighly different format, e.g. `1{b[16]u[12]x:}2{r:}`.
Simpleaf has the ability to pass custom chemistries to Alevin-fry, in a slightly different format, e.g. `1{b[16]u[12]x:}2{r:}`.

For more details, see Simpleaf's paper, [He _et al._ 2023](https://doi.org/10.1093/bioinformatics/btad614).
For more details, see Simpleaf's paper, [He _et al._ 2023](https://doi.org/10.1093/bioinformatics/btad614) and the [detailed description](https://hackmd.io/@PI7Og0l1ReeBZu_pjQGUQQ/rJMgmvr13).

### If using cellranger-arc

Expand Down Expand Up @@ -194,6 +194,10 @@ genome: 'GRCh37'

You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch).

### If using Simpleaf

When building reference index, the underlying mapper of Simpleaf operates on a large number of temporary files on disk simutaneously. Therefore, for environments where CPUs and disk I/O are limited, such as on AWS, specifying `scratch=true` for the `SIMPLEAF_INDEX` module in the `conf/modules.config` file is necessary to avoid slowdowns and potential failures.

### If using cellranger-multi

#### Automatic file name detection
Expand Down
4 changes: 2 additions & 2 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ include { PIPELINE_COMPLETION } from './subworkflows/local/utils_nfcore_scrn
// Thus, manually provided files are not overwritten by the genome attributes
params.fasta = getGenomeAttribute('fasta')
params.gtf = getGenomeAttribute('gtf')
params.salmon_index = getGenomeAttribute('simpleaf')
params.txp2gene = getGenomeAttribute('simpleaf_tx2pgene')
params.simpleaf_index = getGenomeAttribute('simpleaf') ?: getGenomeAttribute('salmon')
params.txp2gene = getGenomeAttribute('simpleaf_txp2gene')
params.cellranger_index = params.aligner == 'cellrangerarc' ?
getGenomeAttribute('cellrangerarc') :
getGenomeAttribute('cellranger')
Expand Down
10 changes: 10 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,16 @@
"git_sha": "f0719ae309075ae4a291533883847c3f7c441dad",
"installed_by": ["modules"]
},
"simpleaf/index": {
"branch": "master",
"git_sha": "e743c6fbe569f66d48b8e11a06e8b514a87d12c9",
"installed_by": ["modules"]
},
"simpleaf/quant": {
"branch": "master",
"git_sha": "e743c6fbe569f66d48b8e11a06e8b514a87d12c9",
"installed_by": ["modules"]
},
"star/genomegenerate": {
"branch": "master",
"git_sha": "46eca555142d6e597729fcb682adcc791796f514",
Expand Down
24 changes: 13 additions & 11 deletions modules/local/alevinqc.nf
Original file line number Diff line number Diff line change
@@ -1,23 +1,25 @@
process ALEVINQC {
grst marked this conversation as resolved.
Show resolved Hide resolved

//
// This module executes alevinfry QC reporting tool on alevin results
// This module executes alevinfry QC reporting tool on alevin-fry results
//

tag "$meta.id"
label 'process_low'

//The alevinqc 1.14.0 container is broken, missing some libraries - thus reverting this to previous 1.12.1 version
conda "bioconda::bioconductor-alevinqc=1.12.1"
DongzeHE marked this conversation as resolved.
Show resolved Hide resolved
conda "bioconda::bioconductor-alevinqc=1.18.0"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/bioconductor-alevinqc:1.12.1--r41h9f5acd7_0' :
'biocontainers/bioconductor-alevinqc:1.12.1--r41h9f5acd7_0' }"
'https://depot.galaxyproject.org/singularity/bioconductor-alevinqc:1.18.0--r43hf17093f_0' :
fmalmeida marked this conversation as resolved.
Show resolved Hide resolved
'biocontainers/bioconductor-alevinqc:1.18.0--r43hf17093f_0' }"

// all metas are the same
input:
tuple val(meta), path(alevin_results)
tuple val(meta), path(quant_dir, stageAs: "quant_dir")
tuple val(meta1), path(permit_dir, stageAs: "permit_dir")
tuple val(meta2), path(map_dir)

output:
tuple val(meta), path("alevin_report_${meta.id}.html"), emit: report
tuple val(meta), path("simpleaf_qc_report_${meta.id}.html"), emit: report
path "versions.yml", emit: versions

when:
Expand All @@ -29,11 +31,11 @@ process ALEVINQC {
#!/usr/bin/env Rscript
require(alevinQC)
alevinFryQCReport(
mapDir = "${alevin_results}/af_map",
quantDir = "${alevin_results}/af_quant",
permitDir= "${alevin_results}/af_quant",
mapDir = "${map_dir}",
permitDir= "${permit_dir}",
quantDir = "${quant_dir}",
sampleId = "${prefix}",
outputFile = "alevin_report_${meta.id}.html",
outputFile = "simpleaf_qc_report_${meta.id}.html",
outputFormat = "html_document",
outputDir = "./",
forceOverwrite = TRUE
Expand Down
55 changes: 0 additions & 55 deletions modules/local/simpleaf_index.nf

This file was deleted.

Loading
Loading