-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate YML contrasts #436
base: dev
Are you sure you want to change the base?
Conversation
POC contrasts csv -> yaml
Update tabulartogseachip
…obbs93/differentialabundance into newfeature_validate_model
Thank you for your feedaback, @suzannejin @pinin4fjords. I'll talk to Alan who developed this module for some of these more specific questions about his design. I will also see how I can adapt this to the template. |
Hi @suzannejin and @pinin4fjords , I have migrated the module to the template structure and answered the questions. Please feel free to take a look again if you can :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a few comments. General points:
- We need to make sure output files and md5sums are not changed unless there's a good reason.
- The R code is a bit scrappy, we should try and make it nicer and more idiomatic. I've attached an AI-assisted version here which I think is closer to the mark and cleaner, it might need some final debugging.
@@ -0,0 +1,5 @@ | |||
process { | |||
withName: 'VALIDATE_YML_MODEL' { | |||
ext.args = params.module_args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there even a module_args
parameter? Probably wouldn't make sense if there was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the test params so that they more accurately reflect the used params in the pipeline.
Co-authored-by: Jonathan Manning <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another comment, but in making it I realised I'd already thought about it when making the VALIDATEFOMCOMPONENTS script.
Sorry to be slow on thinking of this (other things going on), but is there a reason you didn't simply extend that other validation process? Seems confusing for the user/ maintainer to have two validation steps.
Work was already done to enable yaml contrasts there: pinin4fjords/shinyngs#68
This script uses validation functions built into the parsing of the various objects themselves, and that will probably be a neater way to go rather than duplicating validation logic here.
Sorry to be a pain, I can help with this when I get some time and we can bake the work you've done here there. We probably just need to extend the existing logic that validates contrasts here: https://github.com/pinin4fjords/shinyngs/blob/931d9c39c1f2200c66cc628e1ea8d68e970262ef/R/accessory.R#L908
@@ -193,6 +195,15 @@ workflow DIFFERENTIALABUNDANCE { | |||
} | |||
.flatten() | |||
.unique() // Uniquify to keep each contrast variable only once (in case it exists in multiple lines for blocking etc.) | |||
|
|||
VALIDATE_YML_MODEL ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, one last thought.
The issue with doing it this way is that ch_input
etc will proceed through other processes even if there are problems that eventually get flagged by this process. What we really want is for this to stop things before they get that far.
You can use a dummy join to make that happen- I do that in this subworkflow for example. Or you can just have VALIDATE_YML_MODEL spit the matrix and contrasts back out again.
@pinin4fjords I can answer that. We decided to create a new script instead of updating the shinyngs validatefromcomponents module because maintaining a separate tool just to run an R script adds unnecessary complexity and significantly slows down development. Since this process can be handled effectively with a standalone R script, it makes more sense to simplify our workflow rather than having to update and maintain an entire tool every time we need to make changes. Using an R script and module streamlines development, and if we ever need more robustness, we have the option to integrate it into nf-core. |
OK, but I disagree with that call. I'm not asking for the creation of a new script within shinyngs. The fact is, validation functionality is already present, the nf-core module already exists. All this is is some additional logic for that existing validation function. It doesn't make sense to create a new module partially replicating that logic and adding some new. This code needs to go into the existing Shinyngs function, as stated above. That function already has access to the sample sheet and contrasts, so I don't see that being all that difficult. I will assist on PRs and releases to make that happen. |
I'm wondering if it could be a provisional solution to have it as a separate script in the differentialabundance pipeline? We are planning several more interations on this in #429, #377 and #386 and going through shinyngs releases for each step is really really cumbersome. I'm all for putting things in their proper place in the end, but this process is really slowing down development for questionable benefit. |
The benefits aren't questionable for me, they're tangible and I did this for a reason. This was a design decision I made when finalising the first versions of this pipeline, and we transitioned from a development to a production mindset. It allows us to share e.g. parsing logic between processes, and keeps scientific logic out of the workflow, which is then just doing orchestration. We also try and avoid local components, as you will have noted, and this facilitates that. I'm sorry, but I'm not prepared to reverse those benefits just to shorten the development loop, this sort of overhead is not atypical when doing further development on tools in a production state. But as I say I will help with the shinyngs PRs and releases, as I've done previously. |
The description of this new feature is here: #371
I am only merging it to dev as stated here: #411 (comment).
I had to make a few adjustments of which parameters takes as input (
--contrasts_yml
instead of--contrasts
), but other than those minor changes, this whole new feature was developed by @alanmmobbs93 here: #404.PR checklist
nf-core lint
).nf-test test main.nf.test -profile test,docker
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).