Question about D4RL-gym dataset version #4

FineArtz · 2022-01-12T12:50:26Z

Hi, recently I read your paper and it inspire me a lot, and I think it is no doubt a good paper. However, I am confused about the version of D4RL dataset used for your compared baselines. I notice that in "Appendix C Baseline performance sources", the results of BC, MOPO (by the way, I didn't find MOPO in your experiment part) and MBOP are taken from their original papers, all of which use D4RL-gym-v0 datasets.
Because I find that the performance of CQL on D4RL-gym-v0^[1] is greatly different from that on D4RL-gym-v2[2] on several datasets, I wonder that will scores of the above baselines change greatly on D4RL-gym-v2, or you have evidence that this will not happen, since you compare these scores directly?

jannerm · 2022-02-01T00:44:44Z

Nice catch!

BC on v2 performs 4.1 percentage points higher than on v0, with an average score of 51.8 versus 47.7 [1]. I'll update this in the next arXiv version.

I have reached out to the authors of MBOP to see if they can share code for reevaluation on the v2 datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about D4RL-gym dataset version #4

Question about D4RL-gym dataset version #4

FineArtz commented Jan 12, 2022

jannerm commented Feb 1, 2022

Question about D4RL-gym dataset version #4

Question about D4RL-gym dataset version #4

Comments

FineArtz commented Jan 12, 2022

jannerm commented Feb 1, 2022