-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create batch_queue_flux as experimental module #3978
base: master
Are you sure you want to change the base?
Create batch_queue_flux as experimental module #3978
Conversation
Ok, this is a good template for further work. I think you know Flux enough by now to implement |
} | ||
} | ||
|
||
// Flux does not support staging files out of the worker environment, so warn for each file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Batch queue Condor also doesn't stage out files. I wonder why Flux doesn't work with Makeflow then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this requires further investigation. I think it has to do with the location of the working directory the worker node is placed into
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batch_queue_condor
actually does stage out files. In HTCondor, all files created in the sandbox are sent back to the submitter, it isn't necessary to name them individually.
Per our previous discussion, it is ok to assume that the flux executor makes use of a shared filesystem, and it is not necessary to transfer files back and forth. (That is also the general assumption of the "cluster" module) |
I talked with Jim Garlick who's in the Flux dev team, and he told me that they don't have a use case for stage-out yet. |
This is looking pretty good overall, can you please add a suitable paragraph to the makeflow manual describing the module? |
@tphung3 any other concerns? |
No, LGTM. |
Proposed Changes
This PR creates a batch_queue module for the Flux Framework.
batch_queue_flux
is currently marked as experimental. The submit, wait, and remove functionality are implemented, but there is no support for staging out output files back to the submitter because it seems flux assumes a shared filesystem for this purpose. As a result,vine_factory
works using this backend, but more complicated workflows in Makeflow will not work, as they assume output files will be staged back.I also fixed some out of date documentation in
batch_queue.h
Merge Checklist
The following items must be completed before PRs can be merge.
Check these off to verify you have completed all steps.
make test
Run local tests prior to pushing.make format
Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)make lint
Run lint on source code prior to pushing.