Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.1.0 Updates #42

Open
wants to merge 70 commits into
base: master
Choose a base branch
from
Open

1.1.0 Updates #42

wants to merge 70 commits into from

Conversation

atrull314
Copy link
Collaborator

@atrull314 atrull314 commented Dec 24, 2024

PR Checklist

This PR contains the following updates for v1.1.0 of the pipeline:

  • Adding oarfish as an additional quantifier
  • Parallelization of isoquant by splitting on chromosome
  • Upgrading isoquant
  • Allowing read counts to utilize nanocount stats
  • First pass at "modularizing" the pipeline by moving code to subworkflows
  • Closes Minor fix needed in metro diagram #36

atrull314 and others added 30 commits October 7, 2024 10:40
Updating dev versions to 1.1.0dev
…run on just the genome, jsut the transcriptome, or both
Copy link

github-actions bot commented Dec 24, 2024

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 7feb7b8

+| ✅ 190 tests passed       |+
#| ❔   5 tests were ignored |#
!| ❗   3 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-scnanoseq_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-scnanoseq_logo_dark.png
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore
  • template_strings - template_strings

✅ Tests passed:

Run details

  • nf-core/tools version 3.0.2
  • Run at 2025-01-13 23:18:16

@lianov
Copy link
Member

lianov commented Jan 3, 2025

@atrull314 : one thing I did just now catch is that we should disable RSeQC for outputs of the transcriptome aligned files. The outputs are only meaningful with genome aligned files.

@atrull314 atrull314 changed the title 1.2.0 Updates 1.1.0 Updates Jan 7, 2025
Copy link

@itrujnara itrujnara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid release, but I found some minor technical issues. I've also added some ideas to consider, feel free to argue with these.

ext.args = {
[
"--threads 30",
params.barcode_format == "10X_3v3" ? "--kit-version 3v3" : params.barcode_format == "10X_5v2" ? "--kit-version 5v2" : ""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this information not required anymore?

Copy link
Collaborator Author

@atrull314 atrull314 Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, we had modified the blaze module to base threads off task.cpus as is the case with other modules, which resulted in this piece of the documentation being invalidated. This specific change was to resolve an error in documentation that we had put in before the change to the blaze module.

task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Align equal signs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be resolved with commit 57258d7

"${task.process}":
python: \$(python --version | sed 's/Python //g')
END_VERSIONS
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a stub, might be handy later

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be resolved with commit 57258d7

"${task.process}":
oarfish: \$(oarfish --version | sed 's#oarfish ##g')
END_VERSIONS
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments as in MTX_MERGE

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be resolved with commit 57258d7

def args = task.ext.args ?: ''

"""
awk '/^>/{chrom=(split(substr(\$0,2), a, " ")); filename=( a[1] ".split.fa"); print > filename; next}{print >> filename}' $fasta

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to be a new module? Looks like the same result could be achieved with the gawk nf-core module

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue is the gawk nf-core module outputs as a single file, this awk statement is splitting a fasta into all of the chromosomes that compose it, so we need all the individual chromosome fastas as outputs. I'm not sure the module currently supports this behavior


ch_gene_qc_stats = Channel.empty()
ch_transcript_qc_stats = Channel.empty()
if (quantifier.equals("oarfish")) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "both" option handled correctly here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Its handled in a separate file to make sure they are able to run in parallel.

ch_transcript_qc_stats = QUANTIFY_SCRNA_ISOQUANT.out.transcript_qc_stats
}

emit:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Harshil alignment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be resolved with commit 6c6e7ec

//
// MODULE: Samtools Merge
//
SAMTOOLS_MERGE (

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Versions from this one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be resolved with commit 67d0ab3

//
// MODULE: Samtools Index
//
SAMTOOLS_INDEX_MERGED( ch_dedup_single_bam )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be resolved with commit 67d0ab3

validate_params,
"nextflow_schema.json"
null

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this null correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may have been a merge issue, but should be resolved with commit 9951f69

@atrull314
Copy link
Collaborator Author

Solid release, but I found some minor technical issues. I've also added some ideas to consider, feel free to argue with these.

@itrujnara I think I have resolved all your points, please resolve them as you see fit or we can discuss alternative solutions. Thanks!

@lianov lianov self-requested a review January 13, 2025 22:31
Copy link

@itrujnara itrujnara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see all the comments have been addressed, so good to go from my side. Good job!

params.skip_bam_nanocomp,
params.skip_seurat,
false,
false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling splitting the BAM in the transcriptomic data is currently creating a bottleneck at this step in regards to run time (BAM deduplication). This was originally solved in the genome mapping by splitting the BAM by chromosome, but due to the nature of the transcriptomic mappings, this cannot be enabled at this step as is.

@atrull314 and I are currently evaluating different approaches to implementing this optimization (or an alternative method), and once we have updated this portion of the workflow, the PR can move forward.

@atrull314
Copy link
Collaborator Author

I see all the comments have been addressed, so good to go from my side. Good job!

Hi @itrujnara,

As @lianov mentioned we actually ended up finding an issue while we were performing validation of results that resulted in increased runtimes for our deduplication steps. I've kept the PR with the new changes so its easier to review, its available here #43. If you wouldn't mind taking a look when you have a chance.

Thanks,
Austyn

@lianov
Copy link
Member

lianov commented Feb 21, 2025

@itrujnara : yes just adding a note here, that I the PR #43 is now approved but that we have not merged into dev yet as reviewing any other changes in that PR would likely be easier. It is all good from my end, and once it makes it to the dev branch and I can approve this one.

@atrull314 atrull314 requested a review from itrujnara February 28, 2025 16:19
@atrull314
Copy link
Collaborator Author

Hi @itrujnara,

Just wanted to reach out again and ask if you wouldn't mind taking a look at PR #43 for the review.

Thanks,
Austyn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Minor fix needed in metro diagram
4 participants