Skip to content

Commit

Permalink
merging
Browse files Browse the repository at this point in the history
  • Loading branch information
reginald-mclean committed Nov 4, 2024
2 parents 366bc34 + b883558 commit 099943b
Show file tree
Hide file tree
Showing 15 changed files with 255 additions and 56 deletions.
4 changes: 2 additions & 2 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Metaworld documentation
# Meta-World documentation

This directory contains the documentation for Metaworld.
This directory contains the documentation for Meta-World.

For more information about how to contribute to the documentation go to our [CONTRIBUTING.md](https://github.com/Farama-Foundation/Celshast/blob/main/CONTRIBUTING.md)
Binary file removed docs/_static/ml1-1.gif
Binary file not shown.
Binary file removed docs/_static/ml10-1.gif
Binary file not shown.
Binary file removed docs/_static/ml45-1.gif
Binary file not shown.
Binary file removed docs/_static/mt1-1.gif
Binary file not shown.
Binary file removed docs/_static/mt10-1.gif
Binary file not shown.
13 changes: 9 additions & 4 deletions docs/benchmark/action_space.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,17 @@ firstpage:

# Action Space

In the Meta-World benchmark, the agent must simultaneously solve multiple tasks that could be individually defined by their own Markov decision processes.
As this is solved by current approaches using a single policy/model, it requires the action space for all tasks to have a constant size, hence sharing a common structure.

The action space of the Sawyer robot is a ```Box(-1.0, 1.0, (4,), float32)```.
An action represents the Cartesian displacement dx, dy, and dz of the end effector, and an additional action for gripper control.
An action represents the Cartesian displacement `dx`, `dy`, and `dz` of the end-effector, and an additional action for gripper control.

For tasks that do not require the gripper, actions along those dimensions can be masked or ignored and set to a constant value that permanently closes the fingers.

| Num | Action | Control Min | Control Max | Name (in XML file) | Joint | Unit |
|-----|--------|-------------|-------------|---------------------|-------|------|
| 0 | Displacement of the end effector in x direction (dx) | -1 | 1 | mocap | N/A | position (m) |
| 1 | Displacement of the end effector in y direction (dy) | -1 | 1 | mocap | N/A | position (m) |
| 2 | Displacement of the end effector in z direction (dz) | -1 | 1 | mocap | N/A | position (m) |
| 0 | Displacement of the end-effector in x direction (dx) | -1 | 1 | mocap | N/A | position (m) |
| 1 | Displacement of the end-effector in y direction (dy) | -1 | 1 | mocap | N/A | position (m) |
| 2 | Displacement of the end-effector in z direction (dz) | -1 | 1 | mocap | N/A | position (m) |
| 3 | Gripper adjustment (closing/opening) | -1 | 1 | rightclaw, leftclaw | r_close, l_close | position (normalized) |
64 changes: 47 additions & 17 deletions docs/benchmark/benchmark_descriptions.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,29 @@ firstpage:

The benchmark provides a selection of tasks used to study generalization in reinforcement learning (RL).
Different combinations of tasks provide benchmark scenarios suitable for multi-task RL and meta-RL.
Unlike usual RL benchmarks, the training of the agent is strictly split into a training and testing phase.
Unlike usual RL benchmarks, the training of the agent is strictly split into training and testing phases.

## Task Configuration

Meta-World distinguishes between parametric and non-parametric variations.
Parametric variations concern the configuration of the goal or object position, such as changing the location of the puck in the `push` task.

```
TODO: Add code snippets
```

Non-parametric variations are implemented by the settings containing multiple tasks, where the agent is faced with challenges like `push` and `open window` that necessitate a different set of skills.


## Multi-Task Problems

The multi-task setting challenges the agent to learn a predefined set of skills simultaneously.
Below, different levels of difficulty are described.


### Multi-Task (MT1)

In the easiest setting, **MT1**, a single task needs to be learned where the agent must *reach*, *push*, or *pick and place* a goal object.
In the easiest setting, **MT1**, a single task needs to be learned where the agent must, e.g., *reach*, *push*, or *pick place* a goal object.
There is no testing of generalization involved in this setting.

```{figure} ../_static/mt1.gif
Expand All @@ -27,9 +40,9 @@ There is no testing of generalization involved in this setting.

### Multi-Task (MT10)

The **MT10** setting involves learning to solve a diverse set of 10 tasks, as depicted below.
There is no testing of generalization involved in this setting.

The **MT10** evaluation uses 10 tasks: *reach*, *push*, *pick and place*, *open door*, *open drawer*, *close drawer*, *press button top-down*, *insert peg side*, *open window*, and *open box*.
The policy should be provided with a one-hot vector indicating the current task.
The positions of objects and goal positions are fixed in all tasks to focus solely on skill acquisition. <!-- TODO: check this -->


```{figure} ../_static/mt10.gif
Expand All @@ -39,42 +52,59 @@ There is no testing of generalization involved in this setting.

### Multi-Task (MT50)

In the **MT50** setting, the agent is challenged to solve the full suite of 50 tasks contained in metaworld.
The **MT50** evaluation uses all 50 Meta-World tasks.
This is the most challenging multi-task setting and involves no evaluation on test tasks.
As with **MT10**, the policy is provided with a one-hot vector indicating the current task, and object and goal positions are fixed.

See [Task Descriptions](task_descriptions) for more details.

## Meta-Learning Problems

Meta-RL attempts to evaluate the [transfer learning](https://en.
wikipedia.org/wiki/Transfer_learning) capabilities of agents learning skills based on a predefined set of training tasks, by evaluating generalization using a hold-out set of test tasks.
In other words, this setting allows for benchmarking an algorithm's ability to adapt to or learn new tasks.
Meta-RL attempts to evaluate the [transfer learning](https://en.wikipedia.org/wiki/Transfer_learning)
capabilities of agents learning skills based on a predefined set of training
tasks, by evaluating generalization using a hold-out set of test tasks.
In other words, this setting allows for benchmarking an algorithm's
ability to adapt to or learn new tasks.

### Meta-RL (ML1)

The simplest meta-RL setting, **ML1**, involves a single manipulation task, such as *pick and place* of an object with a changing goal location.
For the test evaluation, unseen goal locations are used to measure generalization capabilities.


The simplest meta-RL setting, **ML1**, involves few-shot adaptation to goal
variation within one task. ML1 uses single Meta-World Tasks, with the
meta-training "tasks" corresponding to 50 random initial object and goal
positions, and meta-testing on 10 held-out positions. We evaluate algorithms
on three individual tasks from Meta-World: *reaching*, *pushing*, and *pick and
place*, where the variation is over reaching position or goal object position.
The goal positions are not provided in the observation, forcing meta-RL
algorithms to adapt to the goal through trial-and-error.

```{figure} ../_static/ml1.gif
:alt: Meta-RL 1
:width: 500
```


### Meta-RL (ML10)

The meta-learning setting with 10 tasks, **ML10**, involves training on 10 manipulation tasks and evaluating on 5 unseen tasks during the test phase.
The **ML10** evaluation involves few-shot adaptation to new test tasks with 10
meta-training tasks. We hold out 5 tasks and meta-train policies on 10 tasks.
We randomize object and goal positions and intentionally select training tasks
with structural similarity to the test tasks. Task IDs are not provided as
input, requiring a meta-RL algorithm to identify the tasks from experience.

```{figure} ../_static/ml10.gif
:alt: Meta-RL 10
:alt: Meta-RL 10
:width: 500
```

### Meta-RL (ML45)

The most difficult environment setting of metaworld, **ML45**, challenges the agent to be trained on 45 distinct manipulation tasks and evaluated on 5 test tasks.
The most difficult environment setting of Meta-World, **ML45**, challenges the
agent with few-shot adaptation to new test tasks using 45 meta-training tasks.
Similar to ML10, we hold out 5 tasks for testing and meta-train policies on 45
tasks. Object and goal positions are randomized, and training tasks are
selected for structural similarity to test tasks. As with ML10, task IDs are
not provided, requiring the meta-RL algorithm to identify tasks from experience.

<<<<<<< HEAD

```{figure} ../_static/ml45.gif
:alt: Meta-RL 10
Expand Down
Empty file.
27 changes: 27 additions & 0 deletions docs/benchmark/reward_functions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
layout: "contents"
title: Reward Functions
firstpage:
---

# Reward Functions

Similar structures are provided with the [action](action_space) and [state space](space_space).
Meta-World provides well-shaped reward functions for the individual tasks that are solveable by current single-task reinforcement learning approaches.
To assure equivalent learning in the settings with multiple tasks, all task rewards have the same magnitude.

## Options

Meta-World currently implements two types of reward functions that can be selected
by passing the `reward_func_version` keyword argument to `gym.make(...)`.

### Version 1

Passing `reward_func_version=v1` configures the benchmark with the primary
reward function of Meta-World, which is actually a version of the
`pick-place-wall` task that is modified to also work for the other tasks.


### Version 2

TBA
36 changes: 8 additions & 28 deletions docs/benchmark/state_space.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,32 +6,12 @@ firstpage:

# State Space

The observation array consists of the gripper's (end effector's) position and state, alongside the object of interest's position and orientation. This table will detail each component usually present in such environments:

| Num | Observation Description | Min | Max | Site Name (XML) | Joint Name (XML) | Joint Type | Unit |
|-----|-----------------------------------------------|---------|---------|------------------------|-------------------|------------|-------------|
| 0 | End effector x position in global coordinates | -Inf | Inf | hand | - | - | position (m)|
| 1 | End effector y position in global coordinates | -Inf | Inf | hand | - | - | position (m)|
| 2 | End effector z position in global coordinates | -Inf | Inf | hand | - | - | position (m)|
| 3 | Gripper distance apart | 0.0 | 1.0 | - | - | - | dimensionless|
| 4 | Object x position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)|
| 5 | Object y position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)|
| 6 | Object z position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)|
| 7 | Object x quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
| 8 | Object y quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
| 9 | Object z quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
| 10 | Object w quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
| 11 | Previous end effector x position | -Inf | Inf | hand | - | - | position (m)|
| 12 | Previous end effector y position | -Inf | Inf | hand | - | - | position (m)|
| 13 | Previous end effector z position | -Inf | Inf | hand | - | - | position (m)|
| 14 | Previous gripper distance apart | 0.0 | 1.0 | - | - | - | dimensionless|
| 15 | Previous object x position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)|
| 16 | Previous object y position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)|
| 17 | Previous object z position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)|
| 18 | Previous object x quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
| 19 | Previous object y quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
| 20 | Previous object z quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
| 21 | Previous object w quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
| 22 | Goal x position | -Inf | Inf | goal (derived) | - | - | position (m)|
| 23 | Goal y position | -Inf | Inf | goal (derived) | - | - | position (m)|
| 24 | Goal z position | -Inf | Inf | goal (derived) | - | - | position (m)|
Likewise the [action space](action_space), the state space among the tasks requires maintaining the same structure that allows current approaches to employ a single policy/model.
Meta-World contains tasks that either require manipulation of a single object with a potentially variable goal position (e.g., `reach`, `push`, `pick place`) or two objects with a fixed goal position (e.g., `hammer`, `soccer`, `shelf place`).
To account for such variability, large parts of the observation space are kept as placeholders, e.g., for the second object, if only one object is available.

The observation array consists of the end-effector's 3D Cartesian position and the composition of a single object with its goal coordinates or the positions of the first and second object.
This always results in a 9D state vector.

TODO: Provide table
159 changes: 159 additions & 0 deletions docs/benchmark/task_descriptions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
---
layout: "contents"
title: Task Descriptions
firstpage:
---

# Task Descriptions

In the following, we list all of the 50 tasks contained in Meta-World with a small verbal description.

## Turn on faucet
Rotate the faucet counter-clockwise. Randomize faucet positions

## Sweep
Sweep a puck off the table. Randomize puck positions

## Assemble nut
Pick up a nut and place it onto a peg. Randomize nut and peg positions

## Turn off faucet
Rotate the faucet clockwise. Randomize faucet positions

## Push
Push the puck to a goal. Randomize puck and goal positions

## Pull lever
Pull a lever down 90 degrees. Randomize lever positions

## Turn dial
Rotate a dial 180 degrees. Randomize dial positions

## Push with stick
Grasp a stick and push a box using the stick. Randomize stick positions.

## Get coffee
Push a button on the coffee machine. Randomize the position of the coffee machine

## Pull handle side
Pull a handle up sideways. Randomize the handle positions

## Basketball
Dunk the basketball into the basket. Randomize basketball and basket positions

## Pull with stick
Grasp a stick and pull a box with the stick. Randomize stick positions

## Sweep into hole
Sweep a puck into a hole. Randomize puck positions

## Disassemble nut
Pick a nut out of the a peg. Randomize the nut positions

## Place onto shelf
Pick and place a puck onto a shelf. Randomize puck and shelf positions

## Push mug
Push a mug under a coffee machine. Randomize the mug and the machine positions

## Press handle side
Press a handle down sideways. Randomize the handle positions

## Hammer
Hammer a screw on the wall. Randomize the hammer and the screw positions

## Slide plate
Slide a plate into a cabinet. Randomize the plate and cabinet positions

## Slide plate side
Slide a plate into a cabinet sideways. Randomize the plate and cabinet positions

## Press button wall
Bypass a wall and press a button. Randomize the button positions

## Press handle
Press a handle down. Randomize the handle positions

## Pull handle
Pull a handle up. Randomize the handle positions

## Soccer
Kick a soccer into the goal. Randomize the soccer and goal positions

## Retrieve plate side
Get a plate from the cabinet sideways. Randomize plate and cabinet positions

## Retrieve plate
Get a plate from the cabinet. Randomize plate and cabinet positions

## Close drawer
Push and close a drawer. Randomize the drawer positions

## Press button top
Press a button from the top. Randomize button positions

## Reach
Reach a goal position. Randomize the goal positions

## Press button top wall
Bypass a wall and press a button from the top. Randomize button positions

## Reach with wall
Bypass a wall and reach a goal. Randomize goal positions

## Insert peg side
Insert a peg sideways. Randomize peg and goal positions

## Pull
Pull a puck to a goal. Randomize puck and goal positions

## Push with wall
Bypass a wall and push a puck to a goal. Randomize puck and goal positions

## Pick out of hole
Pick up a puck from a hole. Randomize puck and goal positions

## Pick&place w/ wall
Pick a puck, bypass a wall and place the puck. Randomize puck and goal positions

## Press button
Press a button. Randomize button positions

## Pick&place
Pick and place a puck to a goal. Randomize puck and goal positions

## Pull mug
Pull a mug from a coffee machine. Randomize the mug and the machine positions

## Unplug peg
Unplug a peg sideways. Randomize peg positions

## Close window
Push and close a window. Randomize window positions

## Open window
Push and open a window. Randomize window positions

## Open door
Open a door with a revolving joint. Randomize door positions

## Close door
Close a door with a revolving joint. Randomize door positions

## Open drawer
Open a drawer. Randomize drawer positions

## Insert hand
Insert the gripper into a hole.

## Close box
Grasp the cover and close the box with it. Randomize the cover and box positions

## Lock door
Lock the door by rotating the lock clockwise. Randomize door positions

## Unlock door
Unlock the door by rotating the lock counter-clockwise. Randomize door positions

## Pick bin
Grasp the puck from one bin and place it into another bin. Randomize puck positions
2 changes: 1 addition & 1 deletion docs/citation.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Citation

You can cite Metaworld as follows:
You can cite Meta-World as follows:

```bibtex
@inproceedings{yu2019meta,
Expand Down
Loading

0 comments on commit 099943b

Please sign in to comment.