From b88355805e694a38065db507aae3a494e3bdfb7e Mon Sep 17 00:00:00 2001 From: Frank Roeder Date: Tue, 10 Sep 2024 14:08:52 +0200 Subject: [PATCH] doc: change order of state and action space --- docs/README.md | 4 ++-- docs/benchmark/action_space.md | 7 ++++--- docs/benchmark/benchmark_descriptions.md | 24 +++++++++++------------- docs/benchmark/reward_functions.md | 2 +- docs/benchmark/state_space.md | 8 ++++---- docs/benchmark/task_descriptions.md | 2 +- docs/citation.md | 2 +- docs/index.md | 2 +- 8 files changed, 25 insertions(+), 26 deletions(-) diff --git a/docs/README.md b/docs/README.md index b8f9b2aad..a2cbfc6e9 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,5 +1,5 @@ -# Metaworld documentation +# Meta-World documentation -This directory contains the documentation for Metaworld. +This directory contains the documentation for Meta-World. For more information about how to contribute to the documentation go to our [CONTRIBUTING.md](https://github.com/Farama-Foundation/Celshast/blob/main/CONTRIBUTING.md) diff --git a/docs/benchmark/action_space.md b/docs/benchmark/action_space.md index 1618a5fea..d2c923096 100644 --- a/docs/benchmark/action_space.md +++ b/docs/benchmark/action_space.md @@ -6,12 +6,13 @@ firstpage: # Action Space -In Meta-World benchmark the agent needs to solve multiple tasks simultaneously that could individually defined by their own Markov decision process. -As this is solved by current approaches using a single policy/model, it requires the action space for for all tasks to have a constant size, hence sharing a common structure. +In the Meta-World benchmark, the agent must simultaneously solve multiple tasks that could be individually defined by their own Markov decision processes. +As this is solved by current approaches using a single policy/model, it requires the action space for all tasks to have a constant size, hence sharing a common structure. The action space of the Sawyer robot is a ```Box(-1.0, 1.0, (4,), float32)```. An action represents the Cartesian displacement `dx`, `dy`, and `dz` of the end-effector, and an additional action for gripper control. -For tasks that do not require the gripper, actions along those dimensions could be masked or ignored and set to a constant value that permanently closes the fingers. + +For tasks that do not require the gripper, actions along those dimensions can be masked or ignored and set to a constant value that permanently closes the fingers. | Num | Action | Control Min | Control Max | Name (in XML file) | Joint | Unit | |-----|--------|-------------|-------------|---------------------|-------|------| diff --git a/docs/benchmark/benchmark_descriptions.md b/docs/benchmark/benchmark_descriptions.md index 0a4a21340..39ca0be83 100644 --- a/docs/benchmark/benchmark_descriptions.md +++ b/docs/benchmark/benchmark_descriptions.md @@ -8,18 +8,18 @@ firstpage: The benchmark provides a selection of tasks used to study generalization in reinforcement learning (RL). Different combinations of tasks provide benchmark scenarios suitable for multi-task RL and meta-RL. -Unlike usual RL benchmarks, the training of the agent is strictly split into a training and testing phase. +Unlike usual RL benchmarks, the training of the agent is strictly split into training and testing phases. ## Task Configuration Meta-World distinguishes between parametric and non-parametric variations. -Parametric variations concern the configuration of the goal or object position, hence changing the location of the puck in the `push` task. +Parametric variations concern the configuration of the goal or object position, such as changing the location of the puck in the `push` task. ``` TODO: Add code snippets ``` -Non-parametric are implemented by the settings containing multiple task, hence where the agent is faced with challenges like `push` and `open window` that necessitate a different set of skills. +Non-parametric variations are implemented by the settings containing multiple tasks, where the agent is faced with challenges like `push` and `open window` that necessitate a different set of skills. ## Multi-Task Problems @@ -27,9 +27,10 @@ Non-parametric are implemented by the settings containing multiple task, hence w The multi-task setting challenges the agent to learn a predefined set of skills simultaneously. Below, different levels of difficulty are described. + ### Multi-Task (MT1) -In the easiest setting, **MT1**, a single task needs to be learned where the agent must, e.g, *reach*, *push*, or *pick place* a goal object. +In the easiest setting, **MT1**, a single task needs to be learned where the agent must, e.g., *reach*, *push*, or *pick place* a goal object. There is no testing of generalization involved in this setting. ```{figure} ../_static/mt1.gif @@ -39,11 +40,9 @@ There is no testing of generalization involved in this setting. ### Multi-Task (MT10) -The **MT10** evaluation uses 10 tasks: *reach*, *push*, *pick and place*, -*open door*, *open drawer*, *close drawer*, *press button top-down*, -*insert peg side*, *open window*, and *open box*. The policy is provided with a -one-hot vector indicating the current task. The positions of objects and goal -positions are fixed in all tasks to focus solely on the skill acquisition. +The **MT10** evaluation uses 10 tasks: *reach*, *push*, *pick and place*, *open door*, *open drawer*, *close drawer*, *press button top-down*, *insert peg side*, *open window*, and *open box*. +The policy should be provided with a one-hot vector indicating the current task. +The positions of objects and goal positions are fixed in all tasks to focus solely on skill acquisition. ```{figure} ../_static/mt10.gif :alt: Multi-Task 10 @@ -52,10 +51,9 @@ positions are fixed in all tasks to focus solely on the skill acquisition. ### Multi-Task (MT50) -The **MT50** evaluation uses all 50 Meta-World tasks. This is the most -challenging multi-task setting and involves no evaluation on test tasks. -As with **MT10**, the policy is provided with a one-hot vector indicating -the current task, and object and goal positions are fixed. +The **MT50** evaluation uses all 50 Meta-World tasks. +This is the most challenging multi-task setting and involves no evaluation on test tasks. +As with **MT10**, the policy is provided with a one-hot vector indicating the current task, and object and goal positions are fixed. See [Task Descriptions](task_descriptions) for more details. diff --git a/docs/benchmark/reward_functions.md b/docs/benchmark/reward_functions.md index 5e17477f4..3161a1bed 100644 --- a/docs/benchmark/reward_functions.md +++ b/docs/benchmark/reward_functions.md @@ -18,7 +18,7 @@ by passing the `reward_func_version` keyword argument to `gym.make(...)`. ### Version 1 Passing `reward_func_version=v1` configures the benchmark with the primary -reward function of Metaworld, which is actually a version of the +reward function of Meta-World, which is actually a version of the `pick-place-wall` task that is modified to also work for the other tasks. diff --git a/docs/benchmark/state_space.md b/docs/benchmark/state_space.md index 7d75992be..c1c4de985 100644 --- a/docs/benchmark/state_space.md +++ b/docs/benchmark/state_space.md @@ -7,11 +7,11 @@ firstpage: # State Space -Likewise the [action space](action_space), the state space among the task requires to maintain the same structure that allows current approaches to employ a single policy/model. -Meta-World contains tasks that either require manipulation of a single object with a potentially variable goal postition (e.g., `reach`, `push`,`pick place`) or two objects with a fixed goal postition (e.g., `hammer`, `soccer`, `shelf place`). -To account for such a variability, large parts of the observation space are kept as placeholders, e.g., for the second object, if only one object is avaiable. +Likewise the [action space](action_space), the state space among the tasks requires maintaining the same structure that allows current approaches to employ a single policy/model. +Meta-World contains tasks that either require manipulation of a single object with a potentially variable goal position (e.g., `reach`, `push`, `pick place`) or two objects with a fixed goal position (e.g., `hammer`, `soccer`, `shelf place`). +To account for such variability, large parts of the observation space are kept as placeholders, e.g., for the second object, if only one object is available. -The observation array consists of the end-effector's 3D Cartesian position and the compisition of a single object with its goal coordinates or the positons of the first and second object. +The observation array consists of the end-effector's 3D Cartesian position and the composition of a single object with its goal coordinates or the positions of the first and second object. This always results in a 9D state vector. TODO: Provide table diff --git a/docs/benchmark/task_descriptions.md b/docs/benchmark/task_descriptions.md index 109dd2ac9..41eeb019e 100644 --- a/docs/benchmark/task_descriptions.md +++ b/docs/benchmark/task_descriptions.md @@ -6,7 +6,7 @@ firstpage: # Task Descriptions -In the following, we list all of the 50 tasks contained in Metaworld with a small verbal description. +In the following, we list all of the 50 tasks contained in Meta-World with a small verbal description. ## Turn on faucet Rotate the faucet counter-clockwise. Randomize faucet positions diff --git a/docs/citation.md b/docs/citation.md index c276c9c84..f021e4559 100644 --- a/docs/citation.md +++ b/docs/citation.md @@ -1,6 +1,6 @@ # Citation -You can cite Metaworld as follows: +You can cite Meta-World as follows: ```bibtex @inproceedings{yu2019meta, diff --git a/docs/index.md b/docs/index.md index 9bc3694c5..95de096f8 100644 --- a/docs/index.md +++ b/docs/index.md @@ -50,8 +50,8 @@ usage/basic_usage ```{toctree} :hidden: :caption: Benchmark Information -benchmark/state_space benchmark/action_space +benchmark/state_space benchmark/benchmark_descriptions benchmark/task_descriptions.md benchmark/reward_functions