From b88355805e694a38065db507aae3a494e3bdfb7e Mon Sep 17 00:00:00 2001
From: Frank Roeder <fr.coding@proton.me>
Date: Tue, 10 Sep 2024 14:08:52 +0200
Subject: [PATCH] doc: change order of state and action space

---
 docs/README.md                           |  4 ++--
 docs/benchmark/action_space.md           |  7 ++++---
 docs/benchmark/benchmark_descriptions.md | 24 +++++++++++-------------
 docs/benchmark/reward_functions.md       |  2 +-
 docs/benchmark/state_space.md            |  8 ++++----
 docs/benchmark/task_descriptions.md      |  2 +-
 docs/citation.md                         |  2 +-
 docs/index.md                            |  2 +-
 8 files changed, 25 insertions(+), 26 deletions(-)

diff --git a/docs/README.md b/docs/README.md
index b8f9b2aad..a2cbfc6e9 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -1,5 +1,5 @@
-# Metaworld documentation
+# Meta-World documentation
 
-This directory contains the documentation for Metaworld.
+This directory contains the documentation for Meta-World.
 
 For more information about how to contribute to the documentation go to our [CONTRIBUTING.md](https://github.com/Farama-Foundation/Celshast/blob/main/CONTRIBUTING.md)
diff --git a/docs/benchmark/action_space.md b/docs/benchmark/action_space.md
index 1618a5fea..d2c923096 100644
--- a/docs/benchmark/action_space.md
+++ b/docs/benchmark/action_space.md
@@ -6,12 +6,13 @@ firstpage:
 
 # Action Space
 
-In Meta-World benchmark the agent needs to solve multiple tasks simultaneously that could individually defined by their own Markov decision process.
-As this is solved by current approaches using a single policy/model, it requires the action space for for all tasks to have a constant size, hence sharing a common structure.
+In the Meta-World benchmark, the agent must simultaneously solve multiple tasks that could be individually defined by their own Markov decision processes.
+As this is solved by current approaches using a single policy/model, it requires the action space for all tasks to have a constant size, hence sharing a common structure.
 
 The action space of the Sawyer robot is a ```Box(-1.0, 1.0, (4,), float32)```.
 An action represents the Cartesian displacement `dx`, `dy`, and `dz` of the end-effector, and an additional action for gripper control.
-For tasks that do not require the gripper, actions along those dimensions could be masked or ignored and set to a constant value that permanently closes the fingers.
+
+For tasks that do not require the gripper, actions along those dimensions can be masked or ignored and set to a constant value that permanently closes the fingers.
 
 | Num | Action | Control Min | Control Max | Name (in XML file) | Joint | Unit |
 |-----|--------|-------------|-------------|---------------------|-------|------|
diff --git a/docs/benchmark/benchmark_descriptions.md b/docs/benchmark/benchmark_descriptions.md
index 0a4a21340..39ca0be83 100644
--- a/docs/benchmark/benchmark_descriptions.md
+++ b/docs/benchmark/benchmark_descriptions.md
@@ -8,18 +8,18 @@ firstpage:
 
 The benchmark provides a selection of tasks used to study generalization in reinforcement learning (RL).
 Different combinations of tasks provide benchmark scenarios suitable for multi-task RL and meta-RL.
-Unlike usual RL benchmarks, the training of the agent is strictly split into a training and testing phase.
+Unlike usual RL benchmarks, the training of the agent is strictly split into training and testing phases.
 
 ## Task Configuration
 
 Meta-World distinguishes between parametric and non-parametric variations.
-Parametric variations concern the configuration of the goal or object position, hence changing the location of the puck in the `push` task.
+Parametric variations concern the configuration of the goal or object position, such as changing the location of the puck in the `push` task.
 
 ```
 TODO: Add code snippets
 ```
 
-Non-parametric are implemented by the settings containing multiple task, hence where the agent is faced with challenges like `push` and `open window` that necessitate a different set of skills.
+Non-parametric variations are implemented by the settings containing multiple tasks, where the agent is faced with challenges like `push` and `open window` that necessitate a different set of skills.
 
 
 ## Multi-Task Problems
@@ -27,9 +27,10 @@ Non-parametric are implemented by the settings containing multiple task, hence w
 The multi-task setting challenges the agent to learn a predefined set of skills simultaneously.
 Below, different levels of difficulty are described.
 
+
 ### Multi-Task (MT1)
 
-In the easiest setting, **MT1**, a single task needs to be learned where the agent must, e.g,  *reach*, *push*, or *pick place* a goal object.
+In the easiest setting, **MT1**, a single task needs to be learned where the agent must, e.g., *reach*, *push*, or *pick place* a goal object.
 There is no testing of generalization involved in this setting.
 
 ```{figure} ../_static/mt1.gif
@@ -39,11 +40,9 @@ There is no testing of generalization involved in this setting.
 
 ### Multi-Task (MT10)
 
-The **MT10** evaluation uses 10 tasks: *reach*, *push*, *pick and place*,
-*open door*, *open drawer*, *close drawer*, *press button top-down*,
-*insert peg side*, *open window*, and *open box*. The policy is provided with a
-one-hot vector indicating the current task. The positions of objects and goal
-positions are fixed in all tasks to focus solely on the skill acquisition.
+The **MT10** evaluation uses 10 tasks: *reach*, *push*, *pick and place*, *open door*, *open drawer*, *close drawer*, *press button top-down*, *insert peg side*, *open window*, and *open box*.
+The policy should be provided with a one-hot vector indicating the current task.
+The positions of objects and goal positions are fixed in all tasks to focus solely on skill acquisition. <!-- TODO: check this -->
 
 ```{figure} ../_static/mt10.gif
    :alt: Multi-Task 10 
@@ -52,10 +51,9 @@ positions are fixed in all tasks to focus solely on the skill acquisition.
 
 ### Multi-Task (MT50)
 
-The **MT50** evaluation uses all 50 Meta-World tasks. This is the most
-challenging multi-task setting and involves no evaluation on test tasks.
-As with **MT10**, the policy is provided with a one-hot vector indicating
-the current task, and object and goal positions are fixed.
+The **MT50** evaluation uses all 50 Meta-World tasks.
+This is the most challenging multi-task setting and involves no evaluation on test tasks.
+As with **MT10**, the policy is provided with a one-hot vector indicating the current task, and object and goal positions are fixed.
 
 See [Task Descriptions](task_descriptions) for more details.
 
diff --git a/docs/benchmark/reward_functions.md b/docs/benchmark/reward_functions.md
index 5e17477f4..3161a1bed 100644
--- a/docs/benchmark/reward_functions.md
+++ b/docs/benchmark/reward_functions.md
@@ -18,7 +18,7 @@ by passing the `reward_func_version` keyword argument to `gym.make(...)`.
 ### Version 1
 
 Passing `reward_func_version=v1` configures the benchmark with the primary
-reward function of Metaworld, which is actually a version of the
+reward function of Meta-World, which is actually a version of the
 `pick-place-wall` task that is modified to also work for the other tasks.
 
 
diff --git a/docs/benchmark/state_space.md b/docs/benchmark/state_space.md
index 7d75992be..c1c4de985 100644
--- a/docs/benchmark/state_space.md
+++ b/docs/benchmark/state_space.md
@@ -7,11 +7,11 @@ firstpage:
 # State Space
 
 
-Likewise the [action space](action_space), the state space among the task requires to maintain the same structure that allows current approaches to employ a single policy/model.
-Meta-World contains tasks that either require manipulation of a single object with a potentially variable goal postition (e.g., `reach`, `push`,`pick place`) or two objects with a fixed goal postition (e.g., `hammer`, `soccer`, `shelf place`).
-To account for such a variability, large parts of the observation space are kept as placeholders, e.g., for the second object, if only one object is avaiable.
+Likewise the [action space](action_space), the state space among the tasks requires maintaining the same structure that allows current approaches to employ a single policy/model.
+Meta-World contains tasks that either require manipulation of a single object with a potentially variable goal position (e.g., `reach`, `push`, `pick place`) or two objects with a fixed goal position (e.g., `hammer`, `soccer`, `shelf place`).
+To account for such variability, large parts of the observation space are kept as placeholders, e.g., for the second object, if only one object is available.
 
-The observation array consists of the end-effector's 3D Cartesian position and the compisition of a single object with its goal coordinates or the positons of the first and second object.
+The observation array consists of the end-effector's 3D Cartesian position and the composition of a single object with its goal coordinates or the positions of the first and second object.
 This always results in a 9D state vector.
 
 TODO: Provide table
diff --git a/docs/benchmark/task_descriptions.md b/docs/benchmark/task_descriptions.md
index 109dd2ac9..41eeb019e 100644
--- a/docs/benchmark/task_descriptions.md
+++ b/docs/benchmark/task_descriptions.md
@@ -6,7 +6,7 @@ firstpage:
 
 # Task Descriptions
 
-In the following, we list all of the 50 tasks contained in Metaworld with a small verbal description.
+In the following, we list all of the 50 tasks contained in Meta-World with a small verbal description.
 
 ## Turn on faucet
 Rotate the faucet counter-clockwise. Randomize faucet positions
diff --git a/docs/citation.md b/docs/citation.md
index c276c9c84..f021e4559 100644
--- a/docs/citation.md
+++ b/docs/citation.md
@@ -1,6 +1,6 @@
 # Citation
 
-You can cite Metaworld as follows:
+You can cite Meta-World as follows:
 
 ```bibtex
 @inproceedings{yu2019meta,
diff --git a/docs/index.md b/docs/index.md
index 9bc3694c5..95de096f8 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -50,8 +50,8 @@ usage/basic_usage
 ```{toctree}
 :hidden:
 :caption: Benchmark Information
-benchmark/state_space
 benchmark/action_space
+benchmark/state_space
 benchmark/benchmark_descriptions
 benchmark/task_descriptions.md
 benchmark/reward_functions