Merge branch 'refs/heads/main' into autoreset-mode

Farama-Foundation · Nov 27, 2024 · 606bfaf · 606bfaf
2 parents efb23ba + 13230f4
commit 606bfaf
Show file tree

Hide file tree

Showing 52 changed files with 347 additions and 97 deletions.
diff --git a/.gitignore b/.gitignore
@@ -15,6 +15,7 @@ __pycache__/
 # Virtualenv
 /env
 /venv
+/.venv
 
 # Python egg metadata, regenerated from source files by setuptools.
 /*.egg-info

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -65,6 +65,6 @@ repos:
         language: node
         pass_filenames: false
         types: [python]
-        additional_dependencies: ["[email protected].347"]
+        additional_dependencies: ["[email protected].383"]
         args:
           - --project=pyproject.toml
diff --git a/README.md b/README.md
@@ -60,10 +60,6 @@ Please note that this is an incomplete list, and just includes libraries that th
 
 Gymnasium keeps strict versioning for reproducibility reasons. All environments end in a suffix like "-v0".  When changes are made to environments that might impact learning results, the number is increased by one to prevent potential confusion. These were inherited from Gym.
 
-## Development Roadmap
-
-We have a roadmap for future development work for Gymnasium available here: https://github.com/Farama-Foundation/Gymnasium/issues/12
-
 ## Support Gymnasium's Development
 
 If you are financially able to do so and would like to support the development of Gymnasium, please join others in the community in [donating to us](https://github.com/sponsors/Farama-Foundation).

diff --git a/bin/all-py.Dockerfile b/bin/all-py.Dockerfile
@@ -19,19 +19,21 @@ RUN apt-get -y update \
 
 ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/root/.mujoco/mujoco210/bin"
 
+RUN pip install uv
+
 # Build mujoco-py from source. Pypi installs wheel packages and Cython won't recompile old file versions in the Github Actions CI.
 # Thus generating the following error https://github.com/cython/cython/pull/4428
 RUN git clone https://github.com/openai/mujoco-py.git\
     && cd mujoco-py \
-    && pip install -e .
+    && uv pip install --system -e .
 
 COPY . /usr/local/gymnasium/
 WORKDIR /usr/local/gymnasium/
 
 # Specify the numpy version to cover both 1.x and 2.x
-RUN pip install --upgrade "numpy$NUMPY_VERSION"
+RUN uv pip install --system --upgrade "numpy$NUMPY_VERSION"
 
 # Test with PyTorch CPU build, since CUDA is not available in CI anyway
-RUN pip install .[all,testing] --no-cache-dir --extra-index-url https://download.pytorch.org/whl/cpu
+RUN uv pip install --system .[all,testing] --no-cache-dir --extra-index-url https://download.pytorch.org/whl/cpu
 
 ENTRYPOINT ["/usr/local/gymnasium/bin/docker_entrypoint"]
diff --git a/bin/necessary-py.Dockerfile b/bin/necessary-py.Dockerfile
@@ -20,7 +20,8 @@ RUN apt-get -y update \
 COPY . /usr/local/gymnasium/
 WORKDIR /usr/local/gymnasium/
 
-RUN pip install --upgrade "numpy>=1.21,<2.0"
-RUN pip install .[testing] --no-cache-dir
+RUN pip install uv
+RUN uv pip install --system --upgrade "numpy>=1.21,<2.0"
+RUN uv pip install --system .[testing] --no-cache-dir
 
 ENTRYPOINT ["/usr/local/gymnasium/bin/docker_entrypoint"]
diff --git a/docs/api/wrappers/table.md b/docs/api/wrappers/table.md
@@ -47,7 +47,7 @@ wrapper in the page on the wrapper type
     * - :class:`NumpyToTorch`
       - Wraps a NumPy-based environment such that it can be interacted with PyTorch Tensors.
     * - :class:`OrderEnforcing`
-      - Will produce an error if ``step`` or ``render`` is called before ``render``.
+      - Will produce an error if ``step`` or ``render`` is called before ``reset``.
     * - :class:`PassiveEnvChecker`
       - A passive environment checker wrapper that surrounds the ``step``, ``reset`` and ``render`` functions to check they follows gymnasium's API.
     * - :class:`RecordEpisodeStatistics`

diff --git a/docs/conf.py b/docs/conf.py
@@ -14,6 +14,7 @@
 import os
 import re
 import sys
+import time
 
 import sphinx_gallery.gen_rst
 from furo.gen_tutorials import generate_tutorials
@@ -27,7 +28,7 @@
 
 
 project = "Gymnasium"
-copyright = "2023 Farama Foundation"
+copyright = f"{time.localtime().tm_year} Farama Foundation"
 author = "Farama Foundation"
 
 # The full version, including alpha/beta/rc tags

diff --git a/docs/environments/mujoco.md b/docs/environments/mujoco.md
@@ -106,6 +106,7 @@ env = gymnasium.make("Ant-v5", render_mode="rgb_array", width=1280, height=720)
 
 | Parameter               | Type                                | Default | Description                                                                                                                                                                                                                                              |
 |-------------------------|-------------------------------------|---------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `render_mode`           | **str**                             | `None`   | The modality of the render result. Must be one of `human`, `rgb_array`, `depth_array`, or `rgbd_tuple`. Note that `human` does not return a rendered image, but renders directly to the window                                                                                                                                                                                                                   |
 | `width`                 | **int**                             | `480`   | The width of the render window                                                                                                                                                                                                                           |
 | `height`                | **int**                             | `480`   | The height of the render window                                                                                                                                                                                                                          |
 | `camera_id`             | **int \| None**                     | `None`  | The camera ID used for the render window                                                                                                                                                                                                                 |
@@ -117,11 +118,11 @@ env = gymnasium.make("Ant-v5", render_mode="rgb_array", width=1280, height=720)
 ### Rendering Backend
 The MuJoCo simulator renders images with OpenGL and can use 3 different back ends "glfw" (default), "egl", "omesa", which can be selected by setting an [environment variable](https://en.wikipedia.org/wiki/Environment_variable).
 
-| Backend | Environment Variable       | Description                       |
-|---------|----------------------------|-----------------------------------|
-| `glfw`  | `MUJOCO_GL=glfw` (default) | Renders with window System on GPU |
-| `egl`   | `MUJOCO_GL=egl`            | Renders headless on GPU           |
-| `omesa` | `MUJOCO_GL=omesa`          | Renders headless on CPU           |
+| Backend  | Environment Variable       | Description                       |
+|----------|----------------------------|-----------------------------------|
+| `GLFW`   | `MUJOCO_GL=glfw` (default) | Renders with Window System on GPU |
+| `EGL`    | `MUJOCO_GL=egl`            | Renders headless on GPU           |
+| `OSMESA` | `MUJOCO_GL=osmesa`         | Renders headless on CPU           |
 
 More information of the [MuJoCo/OpenGL documentation](https://mujoco.readthedocs.io/en/stable/programming/index.html#using-opengl).
 <!--

diff --git a/docs/environments/third_party_environments.md b/docs/environments/third_party_environments.md
@@ -25,17 +25,25 @@ goal-RL ([Gymnasium-Robotics](https://robotics.farama.org/)).
 ## Third-party environments with Gymnasium
 *This page contains environments which are not maintained by Farama Foundation and, as such, cannot be guaranteed to function as intended.*
 
-*If you'd like to contribute an environment, please reach out on [Discord](https://discord.gg/MHCFauP67z), then submit a PR by editing this [file](https://github.com/Farama-Foundation/Gymnasium/blob/main/docs/environments/third_party_environments.md).*
+*If you'd like to contribute an environment, please reach out on [Discord](https://discord.gg/MHCFauP67z), then submit a PR by editing this [file](https://github.com/Farama-Foundation/Gymnasium/blob/main/docs/environments/third_party_environments.md), additional instructions can be found inside that file*
 
 <!-- Template
 - [NAME: SUB_NAME_IF_PRESENT](LINK)
 
   ![Gymnasium version dependency](ADD YOUR BADGE HERE)
   ![GitHub stars](ADD YOUR BADGE HERE OPTIONAL)
 
-  A short 2 sentence description.
+  A short 2-5 sentence description.
 -->
 
+<!-- Instructions
+- Follow the template in the file
+- Environments and environment categories are alphabetically sorted
+- You are responsible for picking the environment category, if you would like to add a category please ask
+- Name your PR something like "Add external environment X"
+-->
+
+
 
 ### Autonomous Driving environments
 *Autonomous Vehicle and traffic management.*
@@ -118,6 +126,13 @@ goal-RL ([Gymnasium-Robotics](https://robotics.farama.org/)).
 
   A simple environment for single-agent reinforcement learning algorithms on a clone of [Flappy Bird](https://en.wikipedia.org/wiki/Flappy_Bird), the hugely popular arcade-style mobile game. Both state and pixel observation environments are available.
 
+- [Generals.io bots: Develop your agent for generals.io!](https://github.com/strakam/generals-bots)
+
+  ![Gymnasium version dependency](https://img.shields.io/badge/Gymnasium-v1.0.0-blue)
+  ![GitHub stars](https://img.shields.io/github/stars/strakam/generals-bots)
+
+  Generals.io is a fast-paced strategy game on a 2D grid. We make bot development accessible via the Gymnasium/PettingZoo API.
+
 - [pystk2-gymnasium: SuperTuxKart races gymnasium wrapper](https://github.com/bpiwowar/pystk2-gymnasium)
 
   ![Gymnasium version dependency](https://img.shields.io/badge/Gymnasium-v0.29.1-blue)
@@ -204,6 +219,13 @@ goal-RL ([Gymnasium-Robotics](https://robotics.farama.org/)).
 
   A simple environment using [PyBullet](https://github.com/bulletphysics/bullet3) to simulate the dynamics of a [Bitcraze Crazyflie 2.x](https://www.bitcraze.io/documentation/hardware/crazyflie_2_1/crazyflie_2_1-datasheet.pdf) nanoquadrotor.
 
+- [Itomori: UAV Risk-aware Flight Environment](https://github.com/gustavo-moura/itomori)
+
+  ![Gymnasium version dependency](https://img.shields.io/badge/Gymnasium-v0.29.1-blue)
+  ![GitHub stars](https://img.shields.io/github/stars/gustavo-moura/itomori)
+
+  Itomori is an environment for risk-aware UAV flight, it provides tools to solve Chance-Constrained Markov Decision Processes (CCMDP). The env allows to simulate, visualize, and evaluate UAV navigation in complex and risky environments, incorporating variables like GPS uncertainty, collision risk, and adaptive flight planning. Itomori is intended to support UAV path-planning research by offering adjustable parameters, detailed visualizations, and insights into agent behavior in uncertain environments.
+
 - [OmniIsaacGymEnvs: Gym environments for NVIDIA Omniverse Isaac ](https://github.com/NVIDIA-Omniverse/OmniIsaacGymEnvs/)
 
   Reinforcement Learning Environments for [Omniverse Isaac simulator](https://docs.omniverse.nvidia.com/app_isaacsim/app_isaacsim/overview.html).

diff --git a/docs/introduction/basic_usage.md b/docs/introduction/basic_usage.md
@@ -49,7 +49,7 @@ In reinforcement learning, the classic "agent-environment loop" pictured below i
 :class: only-dark
 ```
 
-For Gymnasium, the "agent-environment-loop" is implemented below for a single episode (until the environment ends). See the next section for a line-by-line explanation. Note that running this code requires installing swig (`pip install swig` or [download](https://www.swig.org/download.html)) along with `pip install gymnasium[box2d]`.
+For Gymnasium, the "agent-environment-loop" is implemented below for a single episode (until the environment ends). See the next section for a line-by-line explanation. Note that running this code requires installing swig (`pip install swig` or [download](https://www.swig.org/download.html)) along with `pip install "gymnasium[box2d]"`.
 
 ```python
 import gymnasium as gym

diff --git a/docs/introduction/create_custom_env.md b/docs/introduction/create_custom_env.md
@@ -106,7 +106,7 @@ Oftentimes, info will also contain some data that is only available inside the :
 ```{eval-rst}
 .. py:currentmodule:: gymnasium.Env
 
-As the purpose of :meth:`reset` is to initiate a new episode for an environment and has two parameters: ``seed`` and ``options``. The seed can be used to initialize the random number generator to a deterministic state and options can be used to specify values used within reset. On the first line of the reset, you need to call ``super().reset(seed=seed)`` which will initialize the random number generate (:attr:`np_random`) to use through the rest of the :meth:`reset`.
+The purpose of :meth:`reset` is to initiate a new episode for an environment and has two parameters: ``seed`` and ``options``. The seed can be used to initialize the random number generator to a deterministic state and options can be used to specify values used within reset. On the first line of the reset, you need to call ``super().reset(seed=seed)`` which will initialize the random number generate (:attr:`np_random`) to use through the rest of the :meth:`reset`.
 
 Within our custom environment, the :meth:`reset` needs to randomly choose the agent and target's positions (we repeat this if they have the same position). The return type of :meth:`reset` is a tuple of the initial observation and any auxiliary information. Therefore, we can use the methods ``_get_obs`` and ``_get_info`` that we implemented earlier for that:
 ```
@@ -144,9 +144,9 @@ The :meth:`step` method usually contains most of the logic for your environment,
 
 For our environment, several things need to happen during the step function:
 
- - We use the self._action_to_direction to convert the discrete action (e.g., 2) to a grid direction with our agent location. To prevent the agent from going out of bounds of the grd, we clip the agen't location to stay within bounds.
+ - We use the self._action_to_direction to convert the discrete action (e.g., 2) to a grid direction with our agent location. To prevent the agent from going out of bounds of the grid, we clip the agent's location to stay within bounds.
  - We compute the agent's reward by checking if the agent's current position is equal to the target's location.
- - Since the environment doesn't truncate internally (we can apply a time limit wrapper to the environment during :meth:make), we permanently set truncated to False.
+ - Since the environment doesn't truncate internally (we can apply a time limit wrapper to the environment during :meth:`make`), we permanently set truncated to False.
  - We once again use _get_obs and _get_info to obtain the agent's observation and auxiliary information.
 ```
 

diff --git a/docs/introduction/record_agent.md b/docs/introduction/record_agent.md
@@ -10,7 +10,7 @@ title: Recording Agents
 
 During training or when evaluating an agent, it may be interesting to record agent behaviour over an episode and log the total reward accumulated. This can be achieved through two wrappers: :class:`RecordEpisodeStatistics` and :class:`RecordVideo`, the first tracks episode data such as the total rewards, episode length and time taken and the second generates mp4 videos of the agents using the environment renderings.
 
-We show how to apply these wrappers for two types of problems; the first for recording data for every episode (normally evaluation) and second for recording data periodiclly (for normal training).
+We show how to apply these wrappers for two types of problems; the first for recording data for every episode (normally evaluation) and second for recording data periodically (for normal training).
 ```
 
 ## Recording Every Episode
@@ -55,7 +55,7 @@ In the script above, for the :class:`RecordVideo` wrapper, we specify three diff
 
 For the :class:`RecordEpisodicStatistics`, we only need to specify the buffer lengths, this is the max length of the internal ``time_queue``, ``return_queue`` and ``length_queue``. Rather than collect the data for each episode individually, we can use the data queues to print the information at the end of the evaluation.
 
-For speed ups in evaluating environments, it is possible to implement this with vector environments to in order to evaluate ``N`` episodes at the same time in parallel rather than series.
+For speed ups in evaluating environments, it is possible to implement this with vector environments in order to evaluate ``N`` episodes at the same time in parallel rather than series.
 ```
 
 ## Recording the Agent during Training
-Original file line number
+Diff line change
@@ Expand Up / @@ -15,6 +15,7 @@ __pycache__/ @@
     # Virtualenv
     /env
     /venv
+    /.venv
     # Python egg metadata, regenerated from source files by setuptools.
     /*.egg-info
@@ Expand Down @@