WIP: Attempts to adjust for Singularity #195

jakirkham · 2018-02-12T19:23:25Z

Tries to adjust the workflow for usage with Singularity on the cluster. However it needs a fair bit of cleanup before it can be merged into the workflow. Leaving it here for now so it is stored and not forgotten.

A rough, working attempt at getting the workflow to start with Singularity. Needs a bit of cleanup to streamline the process. Leaving it here for now so it is stored and not forgotten.

Customize Dask Distributed workers to write to node local scratch space. This avoids having to deal with the slow NFS system for writing. Since workers don't need to read each others' scratch space (they will just request it over the wire) and the data is unneeded after the Dask session ends, this is a nice option to speed things up and avoid data loss due to NFS' slow performance.

Based on some experiments on the cluster, it appears that it takes anywhere from 9s-15s to startup a single worker. So set the startup cost to the average value of 12s. Given it takes a bit to startup workers, bump the interval for rechecking whether to adjust workers to 6s (leaving the number of checks the same as the default of 3). Thus the time for a job to startup and get processing some data should occur a little after if not at least by the 2nd check, which avoid rapid adjustment of workers, while still allowing it to be frequent enough to notice if things need a slight tweak. This should slow down rapid downscaling and upscaling a bit hopefully avoiding fluctuations.

Missing singularity bits. Also need number of cores specified differently for dask-drmaa. Add in a few other things like scratch space, specified queue, and requesting multiple cores.

jakirkham force-pushed the adj_for_singularity branch 4 times, most recently from c66e9f4 to 01510bc Compare February 17, 2018 00:04

jakirkham force-pushed the adj_for_singularity branch 5 times, most recently from 8d0c6cf to c47df22 Compare April 5, 2018 19:29

jakirkham added 4 commits May 9, 2018 15:05

WIP: Attempts to adjust for Singularity

acf0ee4

A rough, working attempt at getting the workflow to start with Singularity. Needs a bit of cleanup to streamline the process. Leaving it here for now so it is stored and not forgotten.

WIP: Rename image to match singularity pull

b22954a

jakirkham force-pushed the adj_for_singularity branch from 2242252 to b22954a Compare May 9, 2018 19:06

jakirkham added 6 commits March 6, 2019 00:13

Merge 'nanshe-org/master' into 'jakirkham/adj_for_singularity'

069a444

Update configuration for cluster workers

2c8957c

Specify normal queue for dictionary learning

4598365

Use 32 cores for dictionary learning

62c3ee6

Update dictionary learning arguments

9c01cfb

Missing singularity bits. Also need number of cores specified differently for dask-drmaa. Add in a few other things like scratch space, specified queue, and requesting multiple cores.

Merge 'nanshe-org/master' into 'jakirkham/adj_for_singularity'

b702b54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Attempts to adjust for Singularity #195

WIP: Attempts to adjust for Singularity #195

jakirkham commented Feb 12, 2018

WIP: Attempts to adjust for Singularity #195

Are you sure you want to change the base?

WIP: Attempts to adjust for Singularity #195

Conversation

jakirkham commented Feb 12, 2018