Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to "provision" temporary/scratch work directories? #498

Open
yarikoptic opened this issue Jan 8, 2020 · 0 comments
Open

How to "provision" temporary/scratch work directories? #498

yarikoptic opened this issue Jan 8, 2020 · 0 comments
Assignees

Comments

@yarikoptic
Copy link
Member

This is "inspired" by the problem originally reported in #438 (comment) with a proposed fix (closed without merge) in #451 to just let tar ignore disappearing files.

Although we still do not know underlying trigger (lingering cleanup process or alike), this specific behavior reminded that in many cases we would like to provide a path to some location (on remote resource) which pipelines could use as a scratch space.
In https://github.com/ReproNim/reproman/pull/438/files#diff-5b4aa18b79cf44a38ba925fff658fd8cR129 I just added that work/ directory to .gitignore. And that probably (will try next) should theoretically be sufficient if I use datalad-pair orchestrator which should datalad save remotely and use datalad update to fetch results.
In case of datalad-pair-run, the content is first tar'ed on remote side (hence that original "inspirational" issue of files disappearing in a work/ directory) including the not-so-needed work dir, which might be huge, so we should allow for that to be avoided.

The easiest way is to specify some work directory outside of the dataset which gets datalad saved/transferred. But ideally

  • it cannot be a fixed name
  • it should be allowed to be not job specific (e.g. if I am to rerun some failed computation, would not be reused across jobs),
  • it should be allowed to be job specific (to avoid any side effects).

so I guess we should

  • allow to define variables per each resource (e.g. I could assign to smaug scratchdir = /mnt/btrfs/scrap/tmp)
  • expose those, jobid, datalad_dataset_id (if datalad-pair*) variables so they could be used to format the command to be executed. So I would specify -w {scratchdir}/jobidfor the case avoiding side-effects, and smth like-w {scratchdir}/myanal` if I want it to be shared .

Also relates to #467 ("cleanup") on what to do with such directories upon success/failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants