From 1d03f74cc9265a5cc287b756731c7ca7f341040f Mon Sep 17 00:00:00 2001 From: Lucas Alegre Date: Sun, 18 Feb 2024 19:35:41 -0300 Subject: [PATCH] Create groups --- docs/environments/continuous-state.md | 37 +++++++++++++++++++++++++++ docs/environments/discrete-state.md | 36 ++++++++++++++++++++++++++ docs/environments/image-state.md | 23 +++++++++++++++++ docs/environments/mujoco.md | 4 ++- 4 files changed, 99 insertions(+), 1 deletion(-) diff --git a/docs/environments/continuous-state.md b/docs/environments/continuous-state.md index e69de29b..1e348b70 100644 --- a/docs/environments/continuous-state.md +++ b/docs/environments/continuous-state.md @@ -0,0 +1,37 @@ +--- +title: "Continuous Observation" +--- + +# Continuous Observation + + +MO-Gymnasium includes environments taken from the MORL literature, as well as multi-objective version of classical environments, such as Mujoco. + +| Env | Obs/Action spaces | Objectives | Description | +|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------|---------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`water-reservoir-v0`](https://mo-gymnasium.farama.org/environments/water-reservoir/)
| Continuous / Continuous | `[cost_flooding, deficit_water]` | A Water reservoir environment. The agent executes a continuous action, corresponding to the amount of water released by the dam. From [Pianosi et al. 2013](https://iwaponline.com/jh/article/15/2/258/3425/Tree-based-fitted-Q-iteration-for-multi-objective). | +| [`mo-mountaincar-v0`](https://mo-gymnasium.farama.org/environments/mo-mountaincar/)
| Continuous / Discrete | `[time_penalty, reverse_penalty, forward_penalty]` | Classic Mountain Car env, but with extra penalties for the forward and reverse actions. From [Vamplew et al. 2011](https://www.researchgate.net/publication/220343783_Empirical_evaluation_methods_for_multiobjective_reinforcement_learning_algorithms). | +| [`mo-mountaincarcontinuous-v0`](https://mo-gymnasium.farama.org/environments/mo-mountaincarcontinuous/)
| Continuous / Continuous | `[time_penalty, fuel_consumption_penalty]` | Continuous Mountain Car env, but with penalties for fuel consumption. | +| [`mo-lunar-lander-v2`](https://mo-gymnasium.farama.org/environments/mo-lunar-lander/)
| Continuous / Discrete or Continuous | `[landed, shaped_reward, main_engine_fuel, side_engine_fuel]` | MO version of the `LunarLander-v2` [environment](https://gymnasium.farama.org/environments/box2d/lunar_lander/). Objectives defined similarly as in [Hung et al. 2022](https://openreview.net/forum?id=AwWaBXLIJE). | +| [`minecart-v0`](https://mo-gymnasium.farama.org/environments/minecart/)
| Continuous or Image / Discrete | `[ore1, ore2, fuel]` | Agent must collect two types of ores and minimize fuel consumption. From [Abels et al. 2019](https://arxiv.org/abs/1809.07803v2). | +| [`mo-highway-v0`](https://mo-gymnasium.farama.org/environments/mo-highway/) and `mo-highway-fast-v0`
| Continuous / Discrete | `[speed, right_lane, collision]` | The agent's objective is to reach a high speed while avoiding collisions with neighbouring vehicles and staying on the rightest lane. From [highway-env](https://github.com/eleurent/highway-env). | +| [`mo-reacher-v4`](https://mo-gymnasium.farama.org/environments/mo-reacher/)
| Continuous / Discrete | `[target_1, target_2, target_3, target_4]` | Mujoco version of `mo-reacher-v0`, based on `Reacher-v4` [environment](https://gymnasium.farama.org/environments/mujoco/reacher/). | +| [`mo-hopper-v4`](https://mo-gymnasium.farama.org/environments/mo-hopper/)
| Continuous / Continuous | `[velocity, height, energy]` | Multi-objective version of [Hopper-v4](https://gymnasium.farama.org/environments/mujoco/hopper/) env. | +| [`mo-halfcheetah-v4`](https://mo-gymnasium.farama.org/environments/mo-halfcheetah/)
| Continuous / Continuous | `[velocity, energy]` | Multi-objective version of [HalfCheetah-v4](https://gymnasium.farama.org/environments/mujoco/half_cheetah/) env. Similar to [Xu et al. 2020](https://github.com/mit-gfx/PGMORL). | + + +```{toctree} +:hidden: +:glob: +:caption: MO-Gymnasium Environments + +./water-reservoir +./mo-mountaincar +./mo-mountaincarcontinuous +./mo-lunar-lander +./minecart +./mo-highway +./mo-reacher +./mo-hopper +./mo-halfcheetah +``` diff --git a/docs/environments/discrete-state.md b/docs/environments/discrete-state.md index e69de29b..bce02651 100644 --- a/docs/environments/discrete-state.md +++ b/docs/environments/discrete-state.md @@ -0,0 +1,36 @@ +--- +title: "Discrete Observation" +--- + +# Discrete Observation + +Environments with discrete observation spaces. + +| Env | Obs/Action spaces | Objectives | Description | +|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------|---------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`deep-sea-treasure-v0`](https://mo-gymnasium.farama.org/environments/deep-sea-treasure/)
| Discrete / Discrete | `[treasure, time_penalty]` | Agent is a submarine that must collect a treasure while taking into account a time penalty. Treasures values taken from [Yang et al. 2019](https://arxiv.org/pdf/1908.08342.pdf). | +| [`deep-sea-treasure-concave-v0`](https://mo-gymnasium.farama.org/environments/deep-sea-treasure-concave/)
| Discrete / Discrete | `[treasure, time_penalty]` | Agent is a submarine that must collect a treasure while taking into account a time penalty. Treasures values taken from [Vamplew et al. 2010](https://link.springer.com/article/10.1007/s10994-010-5232-5). | +| [`deep-sea-treasure-mirrored-v0`](https://mo-gymnasium.farama.org/environments/deep-sea-treasure-mirrored/)
| Discrete / Discrete | `[treasure, time_penalty]` | Harder version of the concave DST [Felten et al. 2022](https://www.scitepress.org/Papers/2022/109891/109891.pdf). | +| [`resource-gathering-v0`](https://mo-gymnasium.farama.org/environments/resource-gathering/)
| Discrete / Discrete | `[enemy, gold, gem]` | Agent must collect gold or gem. Enemies have a 10% chance of killing the agent. From [Barret & Narayanan 2008](https://dl.acm.org/doi/10.1145/1390156.1390162). | +| [`fishwood-v0`](https://mo-gymnasium.farama.org/environments/fishwood/)
| Discrete / Discrete | `[fish_amount, wood_amount]` | ESR environment, the agent must collect fish and wood to light a fire and eat. From [Roijers et al. 2018](https://www.researchgate.net/publication/328718263_Multi-objective_Reinforcement_Learning_for_the_Expected_Utility_of_the_Return). | +| [`breakable-bottles-v0`](https://mo-gymnasium.farama.org/environments/breakable-bottles/)
| Discrete (Dictionary) / Discrete | `[time_penalty, bottles_delivered, potential]` | Gridworld with 5 cells. The agents must collect bottles from the source location and deliver to the destination. From [Vamplew et al. 2021](https://www.sciencedirect.com/science/article/pii/S0952197621000336). | +| [`fruit-tree-v0`](https://mo-gymnasium.farama.org/environments/fruit-tree/)
| Discrete / Discrete | `[nutri1, ..., nutri6]` | Full binary tree of depth d=5,6 or 7. Every leaf contains a fruit with a value for the nutrients Protein, Carbs, Fats, Vitamins, Minerals and Water. From [Yang et al. 2019](https://arxiv.org/pdf/1908.08342.pdf). | +| [`four-room-v0`](https://mo-gymnasium.farama.org/environments/four-room/)
| Discrete / Discrete | `[item1, item2, item3]` | Agent must collect three different types of items in the map and reach the goal. From [Alegre et al. 2022](https://proceedings.mlr.press/v162/alegre22a.html). | + + +```{toctree} +:hidden: +:glob: +:caption: MO-Gymnasium Environments + +./deep-sea-treasure +./deep-sea-treasure-concave +./deep-sea-treasure-mirrored +./resource-gathering +./four-room +./fruit-tree +./breakable-bottles +./fishwood + + +``` diff --git a/docs/environments/image-state.md b/docs/environments/image-state.md index e69de29b..a79f2c87 100644 --- a/docs/environments/image-state.md +++ b/docs/environments/image-state.md @@ -0,0 +1,23 @@ +--- +title: "Image Observation" +--- + +# Image Observation + + +MO-Gymnasium environments with image observation spaces. + +| Env | Obs/Action spaces | Objectives | Description | +|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------|---------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`minecart-rgb-v0`](https://mo-gymnasium.farama.org/environments/minecart/)
| Continuous or Image / Discrete | `[ore1, ore2, fuel]` | Agent must collect two types of ores and minimize fuel consumption. From [Abels et al. 2019](https://arxiv.org/abs/1809.07803v2). | +| [`mo-supermario-v0`](https://mo-gymnasium.farama.org/environments/mo-supermario/)
| Image / Discrete | `[x_pos, time, death, coin, enemy]` | [:warning: SuperMarioBrosEnv support is limited.] Multi-objective version of [SuperMarioBrosEnv](https://github.com/Kautenja/gym-super-mario-bros). Objectives are defined similarly as in [Yang et al. 2019](https://arxiv.org/pdf/1908.08342.pdf). | + +```{toctree} +:hidden: +:glob: +:caption: MO-Gymnasium Environments + +./minecart-rgb +./mo-supermario + +``` diff --git a/docs/environments/mujoco.md b/docs/environments/mujoco.md index 08396ee8..fe0ccc5e 100644 --- a/docs/environments/mujoco.md +++ b/docs/environments/mujoco.md @@ -18,6 +18,8 @@ Multi-objective versions of Mujoco environments. :glob: :caption: MO-Gymnasium Environments -./mujoco/* +./mo-reacher +./mo-hopper +./mo-halfcheetah ```