normalize and denommalize issue #8

Nathan-zh · 2022-03-31T17:21:17Z

Hi Brandon,

I recently install and use the dataset from your package. Thanks for your work to build this benchmark.

I have a problem with the function task.normalize_x.

# load task
task = design_bench.make('AntMorphology-Exact-v0', relabel=False)
# get the top 128 features
aa = task.x[np.argsort(np.squeeze(task.y))[-128:]]
# predict the labels
r1 = np.squeeze(task.predict(aa))

# normalize and denormalize features
bb = task.normalize_x(aa)
cc = task.denormalize_x(bb)
# predict the labels again
r2 = np.squeeze(task.predict(cc))

print(np.max(r1), np.max(r2)) #--> 198.7532   406.76566
print(np.where((aa-cc)>0.001))  #--> (array([], dtype=int64), array([], dtype=int64))

The output shows that normalization and denormalization don't change the features but predictions are quite different. Is there anything wrong with my codes? I feel it's a trivial issue. But I cannot figure out where the problem is.

Nathan

The text was updated successfully, but these errors were encountered:

brandontrabucco · 2022-04-01T13:59:24Z

Hi Nathan-zh,

Thanks for bringing this issue to my attention about the AntMorphology task! From the snippet you provided, the issue may be caused by floating point errors that occur when computing the mean and standard deviation for normalizing the designs. The relevant code is at this location:

design-bench/design_bench/datasets/dataset_builder.py

Line 872 in 27ed0c3

def update_x_statistics(self):

.

In essence, the mean and standard deviation statistics are calculated in a fashion that does not require the entire dataset to be in memory at once (a design choice I made to accommodate the inclusion of larger MBO tasks in the future that require loading the dataset directly from the disk per (x, y) pair). But, this could be exacerbating floating point errors in the calculation of the normalization statistics, as a possible explanation for the difference you are seeing above.

Switching to float64 for the AntMorphology task might be necessary if the problem is due to floating point error.

Nathan-zh · 2022-04-01T15:56:55Z

It does help if I switch to float64. I think this numerical problem is caused by the distribution of features, i.e. most values are around 0 but a few values could be as large as 200-300.

Nathan-zh · 2022-04-03T01:53:35Z

Hi Brandon,

Here is another issue about the Hopper Controller task. I use the exact oracle model, which means predictions should be identical to labels. Maybe it could be a little different as the oracle is a simulator.

task = design_bench.make('HopperController-Exact-v0', relabel=False)
pred = task.predict(task.x[:10])
print(np.squeeze(pred))
--> [59.14225  72.68841  57.52715  59.30107  68.945305 95.25469  54.364407
 58.248234 57.225212 55.378292]

print(np.squeeze(task.y[:10]))
--> [108.34371  128.48705  103.78237   92.259224 147.93976  124.293274
 117.06348  148.98955  133.39757  101.68808 ]

But the output of this snippet is not what I expected. Would you please test this code? Thanks!

Nathan

brandontrabucco · 2022-04-03T15:46:35Z

Thanks for pointing this out, I currently think the issue is due to the original dataset being collected with a stochastic policy, but in order to speed up evaluation, I implemented the oracle for this task as deterministic in the benchmark, so that we don't need to average the performance of more than one rollout.

There is a pull request about this I have yet to merge, I'll let you know once I do:
#3

Nathan-zh closed this as completed Apr 1, 2022

Nathan-zh reopened this Apr 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

normalize and denommalize issue #8

normalize and denommalize issue #8

Nathan-zh commented Mar 31, 2022

brandontrabucco commented Apr 1, 2022

Nathan-zh commented Apr 1, 2022

Nathan-zh commented Apr 3, 2022

brandontrabucco commented Apr 3, 2022 •

edited

Loading

normalize and denommalize issue #8

normalize and denommalize issue #8

Comments

Nathan-zh commented Mar 31, 2022

brandontrabucco commented Apr 1, 2022

Nathan-zh commented Apr 1, 2022

Nathan-zh commented Apr 3, 2022

brandontrabucco commented Apr 3, 2022 • edited Loading

brandontrabucco commented Apr 3, 2022 •

edited

Loading