-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Users are confused about goal conditioning #393
Comments
Given this confusion between Multi-task and Meta-task RL environments where metaworld treats them as identical and requires the user to differentiate (if I understand @krzentner right). |
The individual environment classes are fine. Having a flag which controls goal visibility really is simpler than adding another wrapper. The Having said that, the |
Example of a paper that assumes that MT10 isn't goal conditioned: https://arxiv.org/abs/2003.13661 |
Meta-World was designed to be both a Meta-RL and a Multi-Task RL benchmark.
One of the awkward consequences of that is that the way goal conditioning is handled is very complicated in Meta-World.
Specifically, all environments in Meta-World are goal conditioned, in every benchmark.
However, goals are hidden in Meta-RL, and visible in Multi-Task RL.
This is intended to make "goal inference" part of the Meta-RL objective.
This allows ML1 to be used in a very similar way to older Meta-RL benchmark tasks (like
HalfCheetahVelEnv
or Ant Direction).However, Meta-RL requires that each task be a fully-observable MDP. This requires each "goal" to be considered a different task, and the API reflects this (a
ML1
benchmark object contains 50 train task objects,ML10
contains 500 train task objects).However, Meta-World uses the same API for both Meta-RL and Multi-Task RL. Consequently, using the
Benchmark
API, the goal is changed by passing one of thetask
objects to theset_task
function.In particular, many users don't use the
Benchmark
API, and don't set theseeded_rand_vec
flag either (which randomizes the goals onreset
using the seed passed to the environment on init).This leads users to believe the environments are not goal conditioned, even though they definitely are supposed to be (50 goals per task, set by the seed).
I don't know how many inconsistent results have been published because of this confusion, but at least a few.
TL;DR: Meta-RL requires ML10 to have 500 tasks, Multi-Task RL wants MT10 to have 10 tasks with 50 goals. This confuses users.
We should make the documentation and API more clear and harder to mis-use.
A good first start would be renaming the
seeded_rand_vec
flag, and setting it to True by default in all of the environment constructors when not usingBenchmark
API. Unfortunately, this is a breaking change, and we haven't published any versioned package, so we should definitely make sure we have published at least one version of the package before we do this.The text was updated successfully, but these errors were encountered: