Skip to content

Commit

Permalink
doc: change order of state and action space
Browse files Browse the repository at this point in the history
  • Loading branch information
frankroeder committed Sep 10, 2024
1 parent 0ad355f commit b883558
Show file tree
Hide file tree
Showing 8 changed files with 25 additions and 26 deletions.
4 changes: 2 additions & 2 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Metaworld documentation
# Meta-World documentation

This directory contains the documentation for Metaworld.
This directory contains the documentation for Meta-World.

For more information about how to contribute to the documentation go to our [CONTRIBUTING.md](https://github.com/Farama-Foundation/Celshast/blob/main/CONTRIBUTING.md)
7 changes: 4 additions & 3 deletions docs/benchmark/action_space.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,13 @@ firstpage:

# Action Space

In Meta-World benchmark the agent needs to solve multiple tasks simultaneously that could individually defined by their own Markov decision process.
As this is solved by current approaches using a single policy/model, it requires the action space for for all tasks to have a constant size, hence sharing a common structure.
In the Meta-World benchmark, the agent must simultaneously solve multiple tasks that could be individually defined by their own Markov decision processes.
As this is solved by current approaches using a single policy/model, it requires the action space for all tasks to have a constant size, hence sharing a common structure.

The action space of the Sawyer robot is a ```Box(-1.0, 1.0, (4,), float32)```.
An action represents the Cartesian displacement `dx`, `dy`, and `dz` of the end-effector, and an additional action for gripper control.
For tasks that do not require the gripper, actions along those dimensions could be masked or ignored and set to a constant value that permanently closes the fingers.

For tasks that do not require the gripper, actions along those dimensions can be masked or ignored and set to a constant value that permanently closes the fingers.

| Num | Action | Control Min | Control Max | Name (in XML file) | Joint | Unit |
|-----|--------|-------------|-------------|---------------------|-------|------|
Expand Down
24 changes: 11 additions & 13 deletions docs/benchmark/benchmark_descriptions.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,28 +8,29 @@ firstpage:

The benchmark provides a selection of tasks used to study generalization in reinforcement learning (RL).
Different combinations of tasks provide benchmark scenarios suitable for multi-task RL and meta-RL.
Unlike usual RL benchmarks, the training of the agent is strictly split into a training and testing phase.
Unlike usual RL benchmarks, the training of the agent is strictly split into training and testing phases.

## Task Configuration

Meta-World distinguishes between parametric and non-parametric variations.
Parametric variations concern the configuration of the goal or object position, hence changing the location of the puck in the `push` task.
Parametric variations concern the configuration of the goal or object position, such as changing the location of the puck in the `push` task.

```
TODO: Add code snippets
```

Non-parametric are implemented by the settings containing multiple task, hence where the agent is faced with challenges like `push` and `open window` that necessitate a different set of skills.
Non-parametric variations are implemented by the settings containing multiple tasks, where the agent is faced with challenges like `push` and `open window` that necessitate a different set of skills.


## Multi-Task Problems

The multi-task setting challenges the agent to learn a predefined set of skills simultaneously.
Below, different levels of difficulty are described.


### Multi-Task (MT1)

In the easiest setting, **MT1**, a single task needs to be learned where the agent must, e.g, *reach*, *push*, or *pick place* a goal object.
In the easiest setting, **MT1**, a single task needs to be learned where the agent must, e.g., *reach*, *push*, or *pick place* a goal object.
There is no testing of generalization involved in this setting.

```{figure} ../_static/mt1.gif
Expand All @@ -39,11 +40,9 @@ There is no testing of generalization involved in this setting.

### Multi-Task (MT10)

The **MT10** evaluation uses 10 tasks: *reach*, *push*, *pick and place*,
*open door*, *open drawer*, *close drawer*, *press button top-down*,
*insert peg side*, *open window*, and *open box*. The policy is provided with a
one-hot vector indicating the current task. The positions of objects and goal
positions are fixed in all tasks to focus solely on the skill acquisition.
The **MT10** evaluation uses 10 tasks: *reach*, *push*, *pick and place*, *open door*, *open drawer*, *close drawer*, *press button top-down*, *insert peg side*, *open window*, and *open box*.
The policy should be provided with a one-hot vector indicating the current task.
The positions of objects and goal positions are fixed in all tasks to focus solely on skill acquisition. <!-- TODO: check this -->

```{figure} ../_static/mt10.gif
:alt: Multi-Task 10
Expand All @@ -52,10 +51,9 @@ positions are fixed in all tasks to focus solely on the skill acquisition.

### Multi-Task (MT50)

The **MT50** evaluation uses all 50 Meta-World tasks. This is the most
challenging multi-task setting and involves no evaluation on test tasks.
As with **MT10**, the policy is provided with a one-hot vector indicating
the current task, and object and goal positions are fixed.
The **MT50** evaluation uses all 50 Meta-World tasks.
This is the most challenging multi-task setting and involves no evaluation on test tasks.
As with **MT10**, the policy is provided with a one-hot vector indicating the current task, and object and goal positions are fixed.

See [Task Descriptions](task_descriptions) for more details.

Expand Down
2 changes: 1 addition & 1 deletion docs/benchmark/reward_functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ by passing the `reward_func_version` keyword argument to `gym.make(...)`.
### Version 1

Passing `reward_func_version=v1` configures the benchmark with the primary
reward function of Metaworld, which is actually a version of the
reward function of Meta-World, which is actually a version of the
`pick-place-wall` task that is modified to also work for the other tasks.


Expand Down
8 changes: 4 additions & 4 deletions docs/benchmark/state_space.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ firstpage:
# State Space


Likewise the [action space](action_space), the state space among the task requires to maintain the same structure that allows current approaches to employ a single policy/model.
Meta-World contains tasks that either require manipulation of a single object with a potentially variable goal postition (e.g., `reach`, `push`,`pick place`) or two objects with a fixed goal postition (e.g., `hammer`, `soccer`, `shelf place`).
To account for such a variability, large parts of the observation space are kept as placeholders, e.g., for the second object, if only one object is avaiable.
Likewise the [action space](action_space), the state space among the tasks requires maintaining the same structure that allows current approaches to employ a single policy/model.
Meta-World contains tasks that either require manipulation of a single object with a potentially variable goal position (e.g., `reach`, `push`, `pick place`) or two objects with a fixed goal position (e.g., `hammer`, `soccer`, `shelf place`).
To account for such variability, large parts of the observation space are kept as placeholders, e.g., for the second object, if only one object is available.

The observation array consists of the end-effector's 3D Cartesian position and the compisition of a single object with its goal coordinates or the positons of the first and second object.
The observation array consists of the end-effector's 3D Cartesian position and the composition of a single object with its goal coordinates or the positions of the first and second object.
This always results in a 9D state vector.

TODO: Provide table
2 changes: 1 addition & 1 deletion docs/benchmark/task_descriptions.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ firstpage:

# Task Descriptions

In the following, we list all of the 50 tasks contained in Metaworld with a small verbal description.
In the following, we list all of the 50 tasks contained in Meta-World with a small verbal description.

## Turn on faucet
Rotate the faucet counter-clockwise. Randomize faucet positions
Expand Down
2 changes: 1 addition & 1 deletion docs/citation.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Citation

You can cite Metaworld as follows:
You can cite Meta-World as follows:

```bibtex
@inproceedings{yu2019meta,
Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,8 @@ usage/basic_usage
```{toctree}
:hidden:
:caption: Benchmark Information
benchmark/state_space
benchmark/action_space
benchmark/state_space
benchmark/benchmark_descriptions
benchmark/task_descriptions.md
benchmark/reward_functions
Expand Down

0 comments on commit b883558

Please sign in to comment.