-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assessing uncertainty quantification quality metrics using the design-bench
benchmarks
#7
Comments
Hi sgbaird, Great question! For most (see the note below) of the benchmarking tasks in design-bench, we use a procedure that collects a dataset of design values whose When we apply offline model-based optimization algorithms to these tasks, which are released in our https://github.com/brandontrabucco/design-baselines repository, we then typically subsample the original task dataset in order to hide some fraction of the high-performing designs from the optimizer. This serves to ensure that there exists headroom in the task objective function For your purposes, you may find In addition, if you would like to obtain the highest-performing designs, and their import design_bench
max_percentile = 100
min_percentile = 40
task = design_bench.make("Superconductor-RandomForest-v0",
dataset_kwargs=dict(max_percentile=max_percentile ,
min_percentile=min_percentile)) In terms of getting Let me know if you have any other questions! -Brandon NOTE: Our HopperController suite of tasks does not use subsampling, and if the optimal performance is desired for this task, one can look at the performance of standard RL baselines on the Hopper-v2 MuJoCo task as reference. |
Fantastic! Thank you for the thorough reply. This is great. |
I'm considering using this for some simple tests of a few different uncertainty quantification quality metrics to see which ones are better predictors for how successful an adaptive design scheme will be.
From a very black box standpoint, what this requires is
y_true
,y_pred
, andsigma
(true, predicted, and uncertainty, resp.) and a "notion of best" for the adaptive design task.Does that seem like something feasible/easy to implement with this repository? Or do you think it would be better to look elsewhere or start from scratch?
The text was updated successfully, but these errors were encountered: