Users are confused about goal conditioning #393

krzentner · 2023-03-22T01:30:34Z

Meta-World was designed to be both a Meta-RL and a Multi-Task RL benchmark.
One of the awkward consequences of that is that the way goal conditioning is handled is very complicated in Meta-World.
Specifically, all environments in Meta-World are goal conditioned, in every benchmark.
However, goals are hidden in Meta-RL, and visible in Multi-Task RL.
This is intended to make "goal inference" part of the Meta-RL objective.
This allows ML1 to be used in a very similar way to older Meta-RL benchmark tasks (like HalfCheetahVelEnv or Ant Direction).
However, Meta-RL requires that each task be a fully-observable MDP. This requires each "goal" to be considered a different task, and the API reflects this (a ML1 benchmark object contains 50 train task objects, ML10 contains 500 train task objects).

However, Meta-World uses the same API for both Meta-RL and Multi-Task RL. Consequently, using the Benchmark API, the goal is changed by passing one of the task objects to the set_task function.
In particular, many users don't use the Benchmark API, and don't set the seeded_rand_vec flag either (which randomizes the goals on reset using the seed passed to the environment on init).
This leads users to believe the environments are not goal conditioned, even though they definitely are supposed to be (50 goals per task, set by the seed).
I don't know how many inconsistent results have been published because of this confusion, but at least a few.

TL;DR: Meta-RL requires ML10 to have 500 tasks, Multi-Task RL wants MT10 to have 10 tasks with 50 goals. This confuses users.

We should make the documentation and API more clear and harder to mis-use.
A good first start would be renaming the seeded_rand_vec flag, and setting it to True by default in all of the environment constructors when not using Benchmark API. Unfortunately, this is a breaking change, and we haven't published any versioned package, so we should definitely make sure we have published at least one version of the package before we do this.

The text was updated successfully, but these errors were encountered:

pseudo-rnd-thoughts · 2023-03-22T09:16:45Z

Given this confusion between Multi-task and Meta-task RL environments where metaworld treats them as identical and requires the user to differentiate (if I understand @krzentner right).
Could we not separate these out into two env classes? I understand that users might wish to explore using multi-task environments for meta-task rl but we could create converter classes to support this
Thoughts? @reginald-mclean @krzentner

krzentner · 2023-03-23T02:34:13Z

The individual environment classes are fine. Having a flag which controls goal visibility really is simpler than adding another wrapper. The Benchmark API, however, is a poor fit for Multi-Task RL. We should probably just change the documentation to recommend using the metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE API for Multi-Task RL.

Having said that, the Benchmark API is basically necessary for Meta-RL to work at all, so we need to keep that too.

krzentner · 2023-03-23T23:27:54Z

Example of a paper that assumes that MT10 isn't goal conditioned: https://arxiv.org/abs/2003.13661

krzentner mentioned this issue Mar 22, 2023

Roadmap #388

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Users are confused about goal conditioning #393

Users are confused about goal conditioning #393

krzentner commented Mar 22, 2023

pseudo-rnd-thoughts commented Mar 22, 2023

krzentner commented Mar 23, 2023 •

edited

Loading

krzentner commented Mar 23, 2023

Users are confused about goal conditioning #393

Users are confused about goal conditioning #393

Comments

krzentner commented Mar 22, 2023

pseudo-rnd-thoughts commented Mar 22, 2023

krzentner commented Mar 23, 2023 • edited Loading

krzentner commented Mar 23, 2023

krzentner commented Mar 23, 2023 •

edited

Loading