-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test trial resumability with PBT & Hyperband #20
Comments
Hey, thanks for the great package! I was wondering if you had any update on this issue. Is it supposedly currently possible to resume trials however this feature has not yet been properly tested? |
Hi! PBT is a bit tricky to use with Hydra because it relies on checkpoints being copied from one trial to another while Hydra creates new working dir for each trial and sets them as working directories for the time of the trial execution. It should be possible to use PBT (and Hyperband with checkpointing) if you set your working dir explicitly in hydra config. @Delaunay Is this something you tested yet? |
As Bouthilx pointed out you need to control the directory so the checkpoint can be found between reruns. Maybe something like this would work:
So all the HPO run will end up in the same folder. |
Actually, this would work for ASHA/Hyperband but not for PBT. When using PBT, the trial working directory which corresponds to |
In the case of hydra, shouldn't |
No, it's determined based on the experiment's working dir: https://github.com/Epistimio/orion/blob/develop/src/orion/core/worker/trial.py#L353 |
It can be easily added #35 |
@bouthilx
|
Hey sorry for the late reply, I tried making it work w/ this simple example:
However, every trial's
|
This is expected. What should be happening is that Oríon copies over the dir from the parent trial to the child one, so that if you have a checkpoint there it is available in the child trial directory (happening here https://github.com/Epistimio/orion/blob/develop/src/orion/client/runner.py#L191). Do you see an empty directory instead? @Delaunay Is the hydra plugin using orion's |
Yeah empty with |
No, it does not call the runner since Hydra has its own launcher thing that launch workers |
Alrighty, do y'all think it can be worked around? |
We would need to implement the copy for the algo here right before the experiment is launched |
Got it, based on your responses I'm guessing that's not on the timeline. I'll make a PR to add a warning on the README that PBT-like algorithms aren't functional for now. |
No description provided.
The text was updated successfully, but these errors were encountered: