Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run example with bohb #37

Open
FrancoisPgm opened this issue Feb 6, 2024 · 4 comments
Open

Cannot run example with bohb #37

FrancoisPgm opened this issue Feb 6, 2024 · 4 comments

Comments

@FrancoisPgm
Copy link

FrancoisPgm commented Feb 6, 2024

Hi,
I'm trying to run the example but replacing the algorithm type with bohb. I get a KeyError :

Traceback (most recent call last):
  File "/Users/fpaugam/scratch/test_orion/env/lib/python3.10/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 354, in clientctx
    yield client
  File "/Users/fpaugam/scratch/test_orion/env/lib/python3.10/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 511, in sweep
    raise e
  File "/Users/fpaugam/scratch/test_orion/env/lib/python3.10/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 508, in sweep
    self.optimize(self.client)
  File "/Users/fpaugam/scratch/test_orion/env/lib/python3.10/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 526, in optimize
    trials = self.sample_trials()
  File "/Users/fpaugam/scratch/test_orion/env/lib/python3.10/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 556, in sample_trials
    trials = self.suggest_trials(self.n_workers())
  File "/Users/fpaugam/scratch/test_orion/env/lib/python3.10/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 435, in suggest_trials
    trial = self.client.suggest(pool_size=count)
  File "/Users/fpaugam/scratch/test_orion/env/lib/python3.10/site-packages/orion/client/experiment.py", line 581, in suggest
    trial = reserve_trial(self._experiment, self._producer, pool_size)
  File "/Users/fpaugam/scratch/test_orion/env/lib/python3.10/site-packages/orion/client/experiment.py", line 67, in reserve_trial
    produced = producer.produce(pool_size)
  File "/Users/fpaugam/scratch/test_orion/env/lib/python3.10/site-packages/orion/core/worker/producer.py", line 76, in produce
    new_trials = algorithm.suggest(pool_size)
  File "/Users/fpaugam/scratch/test_orion/env/lib/python3.10/site-packages/orion/core/worker/algo_wrappers/insist_suggest.py", line 42, in suggest
    trials = super().suggest(num)
  File "/Users/fpaugam/scratch/test_orion/env/lib/python3.10/site-packages/orion/core/worker/algo_wrappers/algo_wrapper.py", line 29, in suggest
    trials = self.algorithm.suggest(num)
  File "/Users/fpaugam/scratch/test_orion/env/lib/python3.10/site-packages/orion/core/worker/algo_wrappers/transform_wrapper.py", line 126, in suggest
    self.algorithm.observe([transformed_trial])
  File "/Users/fpaugam/scratch/test_orion/env/lib/python3.10/site-packages/orion/algo/bohb/bohb.py", line 324, in observe
    self.iterations[job.id[0]].register_result(job)
  File "/Users/fpaugam/scratch/test_orion/env/lib/python3.10/site-packages/hpbandster/core/base_iteration.py", line 133, in register_result
    if (not job.result is None) and np.isfinite(result['loss']):
KeyError: 'loss'

I tried returning a dict in my_app : return dict(name="loss", type="objective", value=out), but it doesn't do the trick. How should the loss be returned ?

Thanks

@FrancoisPgm
Copy link
Author

After some poking around with hte config file, it seems that the problem is setting n_workers to a value other than 1. With it being set to 1, returning just a float works. Does BOHB only work with one worker ?

@FrancoisPgm
Copy link
Author

FrancoisPgm commented Feb 12, 2024

Actually I can make it work with n_workers: 10 but only if I increase the range of the fidelity.
Specifically:

  • with n_workers: 10 and epoch: "fidelity(10, 100)" I get the KeyError
  • with n_workers: 1 and epoch: "fidelity(10, 100)" it runs fine
  • with n_workers: 10 and epoch: "fidelity(1, 100)" it runs fine too.

I don't really get what's going on, but I suspect the sampling on the fidelity param has some issues.

@Delaunay
Copy link
Collaborator

I am not sure, @bouthilx do you know ? this seems like a bohb specific thing

@htwangtw
Copy link

htwangtw commented Apr 5, 2024

When I set n_workers: -1 so matching the CPU count (4) and epoch: fidelity(2, 128, base=2), I will run into the same error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants