merging

Farama-Foundation · Nov 4, 2024 · 099943b · 099943b
2 parents 366bc34 + b883558
commit 099943b
Show file tree

Hide file tree

Showing 15 changed files with 255 additions and 56 deletions.
diff --git a/docs/README.md b/docs/README.md
@@ -1,5 +1,5 @@
-# Metaworld documentation
+# Meta-World documentation
 
-This directory contains the documentation for Metaworld.
+This directory contains the documentation for Meta-World.
 
 For more information about how to contribute to the documentation go to our [CONTRIBUTING.md](https://github.com/Farama-Foundation/Celshast/blob/main/CONTRIBUTING.md)
diff --git a/docs/_static/ml1-1.gif b/docs/_static/ml1-1.gif
diff --git a/docs/_static/ml10-1.gif b/docs/_static/ml10-1.gif
diff --git a/docs/_static/ml45-1.gif b/docs/_static/ml45-1.gif
diff --git a/docs/_static/mt1-1.gif b/docs/_static/mt1-1.gif
diff --git a/docs/_static/mt10-1.gif b/docs/_static/mt10-1.gif
diff --git a/docs/benchmark/action_space.md b/docs/benchmark/action_space.md
@@ -6,12 +6,17 @@ firstpage:
 
 # Action Space
 
+In the Meta-World benchmark, the agent must simultaneously solve multiple tasks that could be individually defined by their own Markov decision processes.
+As this is solved by current approaches using a single policy/model, it requires the action space for all tasks to have a constant size, hence sharing a common structure.
+
 The action space of the Sawyer robot is a ```Box(-1.0, 1.0, (4,), float32)```.
-An action represents the Cartesian displacement dx, dy, and dz of the end effector, and an additional action for gripper control.
+An action represents the Cartesian displacement `dx`, `dy`, and `dz` of the end-effector, and an additional action for gripper control.
+
+For tasks that do not require the gripper, actions along those dimensions can be masked or ignored and set to a constant value that permanently closes the fingers.
 
 | Num | Action | Control Min | Control Max | Name (in XML file) | Joint | Unit |
 |-----|--------|-------------|-------------|---------------------|-------|------|
-| 0 | Displacement of the end effector in x direction (dx) | -1 | 1 | mocap | N/A | position (m) |
-| 1 | Displacement of the end effector in y direction (dy) | -1 | 1 | mocap | N/A | position (m) |
-| 2 | Displacement of the end effector in z direction (dz) | -1 | 1 | mocap | N/A | position (m) |
+| 0 | Displacement of the end-effector in x direction (dx) | -1 | 1 | mocap | N/A | position (m) |
+| 1 | Displacement of the end-effector in y direction (dy) | -1 | 1 | mocap | N/A | position (m) |
+| 2 | Displacement of the end-effector in z direction (dz) | -1 | 1 | mocap | N/A | position (m) |
 | 3 | Gripper adjustment (closing/opening) | -1 | 1 | rightclaw, leftclaw | r_close, l_close | position (normalized) |
diff --git a/docs/benchmark/benchmark_descriptions.md b/docs/benchmark/benchmark_descriptions.md
@@ -8,16 +8,29 @@ firstpage:
 
 The benchmark provides a selection of tasks used to study generalization in reinforcement learning (RL).
 Different combinations of tasks provide benchmark scenarios suitable for multi-task RL and meta-RL.
-Unlike usual RL benchmarks, the training of the agent is strictly split into a training and testing phase.
+Unlike usual RL benchmarks, the training of the agent is strictly split into training and testing phases.
+
+## Task Configuration
+
+Meta-World distinguishes between parametric and non-parametric variations.
+Parametric variations concern the configuration of the goal or object position, such as changing the location of the puck in the `push` task.
+
+```
+TODO: Add code snippets
+```
+
+Non-parametric variations are implemented by the settings containing multiple tasks, where the agent is faced with challenges like `push` and `open window` that necessitate a different set of skills.
+
 
 ## Multi-Task Problems
 
 The multi-task setting challenges the agent to learn a predefined set of skills simultaneously.
 Below, different levels of difficulty are described.
 
+
 ### Multi-Task (MT1)
 
-In the easiest setting, **MT1**, a single task needs to be learned where the agent must *reach*, *push*, or *pick and place* a goal object.
+In the easiest setting, **MT1**, a single task needs to be learned where the agent must, e.g., *reach*, *push*, or *pick place* a goal object.
 There is no testing of generalization involved in this setting.
 
 ```{figure} ../_static/mt1.gif
@@ -27,9 +40,9 @@ There is no testing of generalization involved in this setting.
 
 ### Multi-Task (MT10)
 
-The **MT10** setting involves learning to solve a diverse set of 10 tasks, as depicted below.
-There is no testing of generalization involved in this setting.
-
+The **MT10** evaluation uses 10 tasks: *reach*, *push*, *pick and place*, *open door*, *open drawer*, *close drawer*, *press button top-down*, *insert peg side*, *open window*, and *open box*.
+The policy should be provided with a one-hot vector indicating the current task.
+The positions of objects and goal positions are fixed in all tasks to focus solely on skill acquisition. <!-- TODO: check this -->
 
 
 ```{figure} ../_static/mt10.gif
@@ -39,42 +52,59 @@ There is no testing of generalization involved in this setting.
 
 ### Multi-Task (MT50)
 
-In the **MT50** setting, the agent is challenged to solve the full suite of 50 tasks contained in metaworld.
+The **MT50** evaluation uses all 50 Meta-World tasks.
 This is the most challenging multi-task setting and involves no evaluation on test tasks.
+As with **MT10**, the policy is provided with a one-hot vector indicating the current task, and object and goal positions are fixed.
 
+See [Task Descriptions](task_descriptions) for more details.
 
 ## Meta-Learning Problems
 
-Meta-RL attempts to evaluate the [transfer learning](https://en.
-wikipedia.org/wiki/Transfer_learning) capabilities of agents learning skills based on a predefined set of training tasks, by evaluating generalization using a hold-out set of test tasks.
-In other words, this setting allows for benchmarking an algorithm's ability to adapt to or learn new tasks.
+Meta-RL attempts to evaluate the [transfer learning](https://en.wikipedia.org/wiki/Transfer_learning)
+capabilities of agents learning skills based on a predefined set of training
+tasks, by evaluating generalization using a hold-out set of test tasks.
+In other words, this setting allows for benchmarking an algorithm's
+ability to adapt to or learn new tasks.
 
 ### Meta-RL (ML1)
 
-The simplest meta-RL setting, **ML1**, involves a single manipulation task, such as *pick and place* of an object with a changing goal location.
-For the test evaluation, unseen goal locations are used to measure generalization capabilities.
-
-
+The simplest meta-RL setting, **ML1**, involves few-shot adaptation to goal
+variation within one task. ML1 uses single Meta-World Tasks, with the
+meta-training "tasks" corresponding to 50 random initial object and goal
+positions, and meta-testing on 10 held-out positions. We evaluate algorithms
+on three individual tasks from Meta-World: *reaching*, *pushing*, and *pick and
+place*, where the variation is over reaching position or goal object position.
+The goal positions are not provided in the observation, forcing meta-RL
+algorithms to adapt to the goal through trial-and-error.
 
 ```{figure} ../_static/ml1.gif
    :alt: Meta-RL 1
    :width: 500
 ```
 
-
 ### Meta-RL (ML10)
 
-The meta-learning setting with 10 tasks, **ML10**, involves training on 10 manipulation tasks and evaluating on 5 unseen tasks during the test phase.
+The **ML10** evaluation involves few-shot adaptation to new test tasks with 10
+meta-training tasks. We hold out 5 tasks and meta-train policies on 10 tasks.
+We randomize object and goal positions and intentionally select training tasks
+with structural similarity to the test tasks. Task IDs are not provided as
+input, requiring a meta-RL algorithm to identify the tasks from experience.
 
 ```{figure} ../_static/ml10.gif
-   :alt: Meta-RL 10
+   :alt: Meta-RL 10 
    :width: 500
 ```
 
 ### Meta-RL (ML45)
 
-The most difficult environment setting of metaworld, **ML45**, challenges the agent to be trained on 45 distinct manipulation tasks and evaluated on 5 test tasks.
+The most difficult environment setting of Meta-World, **ML45**, challenges the
+agent with few-shot adaptation to new test tasks using 45 meta-training tasks.
+Similar to ML10, we hold out 5 tasks for testing and meta-train policies on 45
+tasks. Object and goal positions are randomized, and training tasks are
+selected for structural similarity to test tasks. As with ML10, task IDs are
+not provided, requiring the meta-RL algorithm to identify tasks from experience.
 
+<<<<<<< HEAD
 
 ```{figure} ../_static/ml45.gif
    :alt: Meta-RL 10

diff --git a/docs/benchmark/env_task_vs_task_init.md b/docs/benchmark/env_task_vs_task_init.md
diff --git a/docs/benchmark/reward_functions.md b/docs/benchmark/reward_functions.md
@@ -0,0 +1,27 @@
+---
+layout: "contents"
+title: Reward Functions 
+firstpage:
+---
+
+# Reward Functions
+
+Similar structures are provided with the [action](action_space) and [state space](space_space).
+Meta-World provides well-shaped reward functions for the individual tasks that are solveable by current single-task reinforcement learning approaches.
+To assure equivalent learning in the settings with multiple tasks, all task rewards have the same magnitude.
+
+## Options
+
+Meta-World currently implements two types of reward functions that can be selected
+by passing the `reward_func_version` keyword argument to `gym.make(...)`.
+
+### Version 1
+
+Passing `reward_func_version=v1` configures the benchmark with the primary
+reward function of Meta-World, which is actually a version of the
+`pick-place-wall` task that is modified to also work for the other tasks.
+
+
+### Version 2
+
+TBA
diff --git a/docs/benchmark/state_space.md b/docs/benchmark/state_space.md
@@ -6,32 +6,12 @@ firstpage:
 
 # State Space
 
-The observation array consists of the gripper's (end effector's) position and state, alongside the object of interest's position and orientation. This table will detail each component usually present in such environments:
 
-| Num | Observation Description                       | Min     | Max     | Site Name (XML)        | Joint Name (XML) | Joint Type | Unit        |
-|-----|-----------------------------------------------|---------|---------|------------------------|-------------------|------------|-------------|
-| 0   | End effector x position in global coordinates | -Inf    | Inf     | hand                   | -                 | -          | position (m)|
-| 1   | End effector y position in global coordinates | -Inf    | Inf     | hand                   | -                 | -          | position (m)|
-| 2   | End effector z position in global coordinates | -Inf    | Inf     | hand                   | -                 | -          | position (m)|
-| 3   | Gripper distance apart                       | 0.0     | 1.0     | -                      | -                 | -          | dimensionless|
-| 4   | Object x position in global coordinates       | -Inf    | Inf     | objGeom (derived)      | -                 | -          | position (m)|
-| 5   | Object y position in global coordinates       | -Inf    | Inf     | objGeom (derived)      | -                 | -          | position (m)|
-| 6   | Object z position in global coordinates       | -Inf    | Inf     | objGeom (derived)      | -                 | -          | position (m)|
-| 7   | Object x quaternion component in global coordinates | -Inf    | Inf | objGeom (derived)      | -                 | -          | quaternion  |
-| 8   | Object y quaternion component in global coordinates | -Inf    | Inf | objGeom (derived)      | -                 | -          | quaternion  |
-| 9   | Object z quaternion component in global coordinates | -Inf    | Inf | objGeom (derived)      | -                 | -          | quaternion  |
-| 10  | Object w quaternion component in global coordinates | -Inf    | Inf | objGeom (derived)      | -                 | -          | quaternion  |
-| 11  | Previous end effector x position              | -Inf    | Inf     | hand                   | -                 | -          | position (m)|
-| 12  | Previous end effector y position              | -Inf    | Inf     | hand                   | -                 | -          | position (m)| 
-| 13  | Previous end effector z position              | -Inf    | Inf     | hand                   | -                 | -          | position (m)|
-| 14  | Previous gripper distance apart               | 0.0     | 1.0     | -                      | -                 | -          | dimensionless|
-| 15  | Previous object x position in global coordinates | -Inf | Inf     | objGeom (derived)      | -                 | -          | position (m)|
-| 16  | Previous object y position in global coordinates | -Inf | Inf     | objGeom (derived)      | -                 | -          | position (m)|
-| 17  | Previous object z position in global coordinates | -Inf | Inf     | objGeom (derived)      | -                 | -          | position (m)|
-| 18  | Previous object x quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
-| 19  | Previous object y quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
-| 20  | Previous object z quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
-| 21  | Previous object w quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
-| 22  | Goal x position                                | -Inf    | Inf     | goal (derived)         | -                 | -          | position (m)|
-| 23  | Goal y position                                | -Inf    | Inf     | goal (derived)         | -                 | -          | position (m)|
-| 24  | Goal z position                                | -Inf    | Inf     | goal (derived)         | -                 | -          | position (m)|
+Likewise the [action space](action_space), the state space among the tasks requires maintaining the same structure that allows current approaches to employ a single policy/model.
+Meta-World contains tasks that either require manipulation of a single object with a potentially variable goal position (e.g., `reach`, `push`, `pick place`) or two objects with a fixed goal position (e.g., `hammer`, `soccer`, `shelf place`).
+To account for such variability, large parts of the observation space are kept as placeholders, e.g., for the second object, if only one object is available.
+
+The observation array consists of the end-effector's 3D Cartesian position and the composition of a single object with its goal coordinates or the positions of the first and second object.
+This always results in a 9D state vector.
+
+TODO: Provide table
diff --git a/docs/benchmark/task_descriptions.md b/docs/benchmark/task_descriptions.md
@@ -0,0 +1,159 @@
+---
+layout: "contents"
+title: Task Descriptions
+firstpage:
+---
+
+# Task Descriptions
+
+In the following, we list all of the 50 tasks contained in Meta-World with a small verbal description.
+
+## Turn on faucet
+Rotate the faucet counter-clockwise. Randomize faucet positions
+
+## Sweep
+Sweep a puck off the table. Randomize puck positions
+
+## Assemble nut
+Pick up a nut and place it onto a peg. Randomize nut and peg positions
+
+## Turn off faucet
+Rotate the faucet clockwise. Randomize faucet positions
+
+## Push
+Push the puck to a goal. Randomize puck and goal positions
+
+## Pull lever
+Pull a lever down 90 degrees. Randomize lever positions
+
+## Turn dial
+Rotate a dial 180 degrees. Randomize dial positions
+
+## Push with stick
+Grasp a stick and push a box using the stick. Randomize stick positions.
+
+## Get coffee
+Push a button on the coffee machine. Randomize the position of the coffee machine
+
+## Pull handle side
+Pull a handle up sideways. Randomize the handle positions
+
+## Basketball
+Dunk the basketball into the basket. Randomize basketball and basket positions
+
+## Pull with stick
+Grasp a stick and pull a box with the stick. Randomize stick positions
+
+## Sweep into hole
+Sweep a puck into a hole. Randomize puck positions
+
+## Disassemble nut
+Pick a nut out of the a peg. Randomize the nut positions
+
+## Place onto shelf
+Pick and place a puck onto a shelf. Randomize puck and shelf positions
+
+## Push mug
+Push a mug under a coffee machine. Randomize the mug and the machine positions
+
+## Press handle side
+Press a handle down sideways. Randomize the handle positions
+
+## Hammer
+Hammer a screw on the wall. Randomize the hammer and the screw positions
+
+## Slide plate
+Slide a plate into a cabinet. Randomize the plate and cabinet positions
+
+## Slide plate side
+Slide a plate into a cabinet sideways. Randomize the plate and cabinet positions
+
+## Press button wall
+Bypass a wall and press a button. Randomize the button positions
+
+## Press handle
+Press a handle down. Randomize the handle positions
+
+## Pull handle
+Pull a handle up. Randomize the handle positions
+
+## Soccer
+Kick a soccer into the goal. Randomize the soccer and goal positions
+
+## Retrieve plate side
+Get a plate from the cabinet sideways. Randomize plate and cabinet positions
+
+## Retrieve plate
+Get a plate from the cabinet. Randomize plate and cabinet positions
+
+## Close drawer
+Push and close a drawer. Randomize the drawer positions
+
+## Press button top
+Press a button from the top. Randomize button positions
+
+## Reach
+Reach a goal position. Randomize the goal positions
+
+## Press button top wall
+Bypass a wall and press a button from the top. Randomize button positions
+
+## Reach with wall
+Bypass a wall and reach a goal. Randomize goal positions
+
+## Insert peg side
+Insert a peg sideways. Randomize peg and goal positions
+
+## Pull
+Pull a puck to a goal. Randomize puck and goal positions
+
+## Push with wall
+Bypass a wall and push a puck to a goal. Randomize puck and goal positions
+
+## Pick out of hole
+Pick up a puck from a hole. Randomize puck and goal positions
+
+## Pick&place w/ wall
+Pick a puck, bypass a wall and place the puck. Randomize puck and goal positions
+
+## Press button
+Press a button. Randomize button positions
+
+## Pick&place
+Pick and place a puck to a goal. Randomize puck and goal positions
+
+## Pull mug
+Pull a mug from a coffee machine. Randomize the mug and the machine positions
+
+## Unplug peg
+Unplug a peg sideways. Randomize peg positions
+
+## Close window
+Push and close a window. Randomize window positions
+
+## Open window
+Push and open a window. Randomize window positions
+
+## Open door
+Open a door with a revolving joint. Randomize door positions
+
+## Close door
+Close a door with a revolving joint. Randomize door positions
+
+## Open drawer
+Open a drawer. Randomize drawer positions
+
+## Insert hand
+Insert the gripper into a hole.
+
+## Close box
+Grasp the cover and close the box with it. Randomize the cover and box positions
+
+## Lock door
+Lock the door by rotating the lock clockwise. Randomize door positions
+
+## Unlock door
+Unlock the door by rotating the lock counter-clockwise. Randomize door positions
+
+## Pick bin
+Grasp the puck from one bin and place it into another bin. Randomize puck positions
diff --git a/docs/citation.md b/docs/citation.md
@@ -1,6 +1,6 @@
 # Citation
 
-You can cite Metaworld as follows:
+You can cite Meta-World as follows:
 
 ```bibtex
 @inproceedings{yu2019meta,