Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Attempts to adjust for Singularity #195

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

jakirkham
Copy link
Member

Tries to adjust the workflow for usage with Singularity on the cluster. However it needs a fair bit of cleanup before it can be merged into the workflow. Leaving it here for now so it is stored and not forgotten.

@jakirkham jakirkham force-pushed the adj_for_singularity branch 4 times, most recently from c66e9f4 to 01510bc Compare February 17, 2018 00:04
@jakirkham jakirkham force-pushed the adj_for_singularity branch 5 times, most recently from 8d0c6cf to c47df22 Compare April 5, 2018 19:29
A rough, working attempt at getting the workflow to start with
Singularity. Needs a bit of cleanup to streamline the process. Leaving
it here for now so it is stored and not forgotten.
Customize Dask Distributed workers to write to node local scratch space.
This avoids having to deal with the slow NFS system for writing. Since
workers don't need to read each others' scratch space (they will just
request it over the wire) and the data is unneeded after the Dask
session ends, this is a nice option to speed things up and avoid data
loss due to NFS' slow performance.
Based on some experiments on the cluster, it appears that it takes
anywhere from 9s-15s to startup a single worker. So set the startup cost
to the average value of 12s. Given it takes a bit to startup workers,
bump the interval for rechecking whether to adjust workers to 6s
(leaving the number of checks the same as the default of 3). Thus the
time for a job to startup and get processing some data should occur a
little after if not at least by the 2nd check, which avoid rapid
adjustment of workers, while still allowing it to be frequent enough to
notice if things need a slight tweak. This should slow down rapid
downscaling and upscaling a bit hopefully avoiding fluctuations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant