-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create usable Nextflow regression test Action #16
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This reverts commit 03da4e6.
This is awesome work Nick! LGrTM! |
yashpatel6
approved these changes
Mar 5, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! If no other concerns, I'm ok with merging
This was referenced Mar 5, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This follows on from #12 to create a functional Nextflow regression test Action (technically it's a reusable workflow). All that must be done to use it is for pipelines to add this workflow...
... which will then automatically discover and run any test files. See uclahs-cds/pipeline-recalibrate-BAM#52 for a working example.
Interface
Test File Format
There are a few changes to the test format I laid out in #12.
First, test files are now required to have filenames starting with
configtest
. That makes it so that they are discoverable and do not need to be specifically registered.Second, I removed the
empty_files
andmapped_files
fields. In discussing this with various lab members I couldn't justify any reason for them to exist - configurations that depend upon the contents of untracked files are bad configurations. The loss ofempty_files
does mean that most tests will need to add the mockcheck_paths: ""
.Third, I added the field
nextflow_version
. The updated test files written out at runtime have that value updated based on the container, which should help protect against version mismatches. Currently it's still hardcoded in a few places to23.10.0
but I want to support multiple versions simultaneously.Fourth, I'm sanitizing the
expected_results
fields to avoid spurious diffs. All dates indated_fields
are replaced with19970704T165655Z
(Pathfinder's landing), and allObject.toString()
-like values (e.g.[Ljava.lang.String;@49c7b90e
) have the hash replaced withdec0ded
(although they really shouldn't be showing up anyway).Results Reporting
Each test gets a status check on the PR:
The specific changes are reported as annotations within the code view:
Each test, successful or not, attaches the output file to the check run. That makes it easy to download and overwrite test files when changes occur.
Internals
ActionReusable Workflow StructureThe workflow has two jobs:
discover-tests
... discovers all the tests.run-test
has a matrix strategy such that it runs each test independently and in parallel. For some unfathomable reason I could not get this to work with a standarduses: docker://
line, as GitHub insists on pulling the image at the start of the job before I can provide any credentials.Docker Image
Rather than a python script that calls Docker, the entire test process is now bundled up in a pre-compiled Docker image (https://github.com/uclahs-cds/tool-Nextflow-action/pkgs/container/nextflow-config-tests). The Dockerfile is tracked in this repository, but the build action is a little unusual - it only builds on commits to
main
that change files in therun-nextflow-tests/
folder or the build workflow itself. It also adds two custom tags for the Nextflow version and the Nextflow version and the current date (e.g.23.10.0
and23.10.0-2024-03-01
. It does not build on git tags.The image itself is mostly the same - the key changes are that I made the Nextflow hijacking process a little more robust and installed python.
I deleted all of the bundled test files, but once this gets merged I'm intended to open a variety of PRs to establish them in their corresponding pipelines.
Checklist
This PR does NOT contain Protected Health Information (PHI). A repo may need to be deleted if such data is uploaded.
Disclosing PHI is a major problem1 - Even a small leak can be costly2.
This PR does NOT contain germline genetic data3, RNA-Seq, DNA methylation, microbiome or other molecular data4.
.png
, .jpeg
),.pdf
,.RData
,.xlsx
,.doc
,.ppt
, or other output files.To automatically exclude such files using a .gitignore file, see here for example.
I have read the code review guidelines and the code review best practice on GitHub check-list.
I have set up or verified the
main
branch protection rule following the github standards before opening this pull request.The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
I have added the major changes included in this pull request to the
CHANGELOG.md
under the next release version or unreleased, and updated the date.Footnotes
UCLA Health reaches $7.5m settlement over 2015 breach of 4.5m patient records ↩
The average healthcare data breach costs $2.2 million, despite the majority of breaches releasing fewer than 500 records. ↩
Genetic information is considered PHI.
Forensic assays can identify patients with as few as 21 SNPs ↩
RNA-Seq, DNA methylation, microbiome, or other molecular data can be used to predict genotypes (PHI) and reveal a patient's identity. ↩