Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate YML contrasts #436

Open
wants to merge 54 commits into
base: dev
Choose a base branch
from

Conversation

atrigila
Copy link

@atrigila atrigila commented Feb 14, 2025

The description of this new feature is here: #371
I am only merging it to dev as stated here: #411 (comment).
I had to make a few adjustments of which parameters takes as input (--contrasts_yml instead of --contrasts), but other than those minor changes, this whole new feature was developed by @alanmmobbs93 here: #404.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/differentialabundance branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nf-test test main.nf.test -profile test,docker).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

alanmmobbs93 and others added 30 commits December 13, 2024 19:45
@atrigila
Copy link
Author

Thank you for your feedaback, @suzannejin @pinin4fjords. I'll talk to Alan who developed this module for some of these more specific questions about his design. I will also see how I can adapt this to the template.

@atrigila
Copy link
Author

Hi @suzannejin and @pinin4fjords , I have migrated the module to the template structure and answered the questions. Please feel free to take a look again if you can :)

Copy link
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a few comments. General points:

  • We need to make sure output files and md5sums are not changed unless there's a good reason.
  • The R code is a bit scrappy, we should try and make it nicer and more idiomatic. I've attached an AI-assisted version here which I think is closer to the mark and cleaner, it might need some final debugging.

foo.r.zip

@@ -0,0 +1,5 @@
process {
withName: 'VALIDATE_YML_MODEL' {
ext.args = params.module_args
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there even a module_args parameter? Probably wouldn't make sense if there was.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the test params so that they more accurately reflect the used params in the pipeline.

Copy link
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another comment, but in making it I realised I'd already thought about it when making the VALIDATEFOMCOMPONENTS script.

Sorry to be slow on thinking of this (other things going on), but is there a reason you didn't simply extend that other validation process? Seems confusing for the user/ maintainer to have two validation steps.

Work was already done to enable yaml contrasts there: pinin4fjords/shinyngs#68

This script uses validation functions built into the parsing of the various objects themselves, and that will probably be a neater way to go rather than duplicating validation logic here.

Sorry to be a pain, I can help with this when I get some time and we can bake the work you've done here there. We probably just need to extend the existing logic that validates contrasts here: https://github.com/pinin4fjords/shinyngs/blob/931d9c39c1f2200c66cc628e1ea8d68e970262ef/R/accessory.R#L908

@@ -193,6 +195,15 @@ workflow DIFFERENTIALABUNDANCE {
}
.flatten()
.unique() // Uniquify to keep each contrast variable only once (in case it exists in multiple lines for blocking etc.)

VALIDATE_YML_MODEL (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, one last thought.

The issue with doing it this way is that ch_input etc will proceed through other processes even if there are problems that eventually get flagged by this process. What we really want is for this to stop things before they get that far.

You can use a dummy join to make that happen- I do that in this subworkflow for example. Or you can just have VALIDATE_YML_MODEL spit the matrix and contrasts back out again.

@nschcolnicov
Copy link

Another comment, but in making it I realised I'd already thought about it when making the VALIDATEFOMCOMPONENTS script.

Sorry to be slow on thinking of this (other things going on), but is there a reason you didn't simply extend that other validation process? Seems confusing for the user/ maintainer to have two validation steps.

Work was already done to enable yaml contrasts there: pinin4fjords/shinyngs#68

This script uses validation functions built into the parsing of the various objects themselves, and that will probably be a neater way to go rather than duplicating validation logic here.

Sorry to be a pain, I can help with this when I get some time and we can bake the work you've done here there. We probably just need to extend the existing logic that validates contrasts here: https://github.com/pinin4fjords/shinyngs/blob/931d9c39c1f2200c66cc628e1ea8d68e970262ef/R/accessory.R#L908

@pinin4fjords I can answer that. We decided to create a new script instead of updating the shinyngs validatefromcomponents module because maintaining a separate tool just to run an R script adds unnecessary complexity and significantly slows down development. Since this process can be handled effectively with a standalone R script, it makes more sense to simplify our workflow rather than having to update and maintain an entire tool every time we need to make changes. Using an R script and module streamlines development, and if we ever need more robustness, we have the option to integrate it into nf-core.

@pinin4fjords
Copy link
Member

Another comment, but in making it I realised I'd already thought about it when making the VALIDATEFOMCOMPONENTS script.
Sorry to be slow on thinking of this (other things going on), but is there a reason you didn't simply extend that other validation process? Seems confusing for the user/ maintainer to have two validation steps.
Work was already done to enable yaml contrasts there: pinin4fjords/shinyngs#68
This script uses validation functions built into the parsing of the various objects themselves, and that will probably be a neater way to go rather than duplicating validation logic here.
Sorry to be a pain, I can help with this when I get some time and we can bake the work you've done here there. We probably just need to extend the existing logic that validates contrasts here: https://github.com/pinin4fjords/shinyngs/blob/931d9c39c1f2200c66cc628e1ea8d68e970262ef/R/accessory.R#L908

@pinin4fjords I can answer that. We decided to create a new script instead of updating the shinyngs validatefromcomponents module because maintaining a separate tool just to run an R script adds unnecessary complexity and significantly slows down development. Since this process can be handled effectively with a standalone R script, it makes more sense to simplify our workflow rather than having to update and maintain an entire tool every time we need to make changes. Using an R script and module streamlines development, and if we ever need more robustness, we have the option to integrate it into nf-core.

OK, but I disagree with that call.

I'm not asking for the creation of a new script within shinyngs. The fact is, validation functionality is already present, the nf-core module already exists. All this is is some additional logic for that existing validation function. It doesn't make sense to create a new module partially replicating that logic and adding some new.

This code needs to go into the existing Shinyngs function, as stated above. That function already has access to the sample sheet and contrasts, so I don't see that being all that difficult. I will assist on PRs and releases to make that happen.

@grst
Copy link
Member

grst commented Feb 25, 2025

I'm wondering if it could be a provisional solution to have it as a separate script in the differentialabundance pipeline? We are planning several more interations on this in #429, #377 and #386 and going through shinyngs releases for each step is really really cumbersome. I'm all for putting things in their proper place in the end, but this process is really slowing down development for questionable benefit.

@pinin4fjords
Copy link
Member

I'm wondering if it could be a provisional solution to have it as a separate script in the differentialabundance pipeline? We are planning several more interations on this in #429, #377 and #386 and going through shinyngs releases for each step is really really cumbersome. I'm all for putting things in their proper place in the end, but this process is really slowing down development for questionable benefit.

The benefits aren't questionable for me, they're tangible and I did this for a reason. This was a design decision I made when finalising the first versions of this pipeline, and we transitioned from a development to a production mindset. It allows us to share e.g. parsing logic between processes, and keeps scientific logic out of the workflow, which is then just doing orchestration. We also try and avoid local components, as you will have noted, and this facilitates that.

I'm sorry, but I'm not prepared to reverse those benefits just to shorten the development loop, this sort of overhead is not atypical when doing further development on tools in a production state.

But as I say I will help with the shinyngs PRs and releases, as I've done previously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants