Skip to content

Commit

Permalink
Merge branch 'refs/heads/main' into autoreset-mode
Browse files Browse the repository at this point in the history
  • Loading branch information
pseudo-rnd-thoughts committed Nov 27, 2024
2 parents efb23ba + 13230f4 commit 606bfaf
Show file tree
Hide file tree
Showing 52 changed files with 347 additions and 97 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ __pycache__/
# Virtualenv
/env
/venv
/.venv

# Python egg metadata, regenerated from source files by setuptools.
/*.egg-info
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,6 @@ repos:
language: node
pass_filenames: false
types: [python]
additional_dependencies: ["[email protected].347"]
additional_dependencies: ["[email protected].383"]
args:
- --project=pyproject.toml
4 changes: 0 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,10 +60,6 @@ Please note that this is an incomplete list, and just includes libraries that th

Gymnasium keeps strict versioning for reproducibility reasons. All environments end in a suffix like "-v0". When changes are made to environments that might impact learning results, the number is increased by one to prevent potential confusion. These were inherited from Gym.

## Development Roadmap

We have a roadmap for future development work for Gymnasium available here: https://github.com/Farama-Foundation/Gymnasium/issues/12

## Support Gymnasium's Development

If you are financially able to do so and would like to support the development of Gymnasium, please join others in the community in [donating to us](https://github.com/sponsors/Farama-Foundation).
Expand Down
8 changes: 5 additions & 3 deletions bin/all-py.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -19,19 +19,21 @@ RUN apt-get -y update \

ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/root/.mujoco/mujoco210/bin"

RUN pip install uv

# Build mujoco-py from source. Pypi installs wheel packages and Cython won't recompile old file versions in the Github Actions CI.
# Thus generating the following error https://github.com/cython/cython/pull/4428
RUN git clone https://github.com/openai/mujoco-py.git\
&& cd mujoco-py \
&& pip install -e .
&& uv pip install --system -e .

COPY . /usr/local/gymnasium/
WORKDIR /usr/local/gymnasium/

# Specify the numpy version to cover both 1.x and 2.x
RUN pip install --upgrade "numpy$NUMPY_VERSION"
RUN uv pip install --system --upgrade "numpy$NUMPY_VERSION"

# Test with PyTorch CPU build, since CUDA is not available in CI anyway
RUN pip install .[all,testing] --no-cache-dir --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --system .[all,testing] --no-cache-dir --extra-index-url https://download.pytorch.org/whl/cpu

ENTRYPOINT ["/usr/local/gymnasium/bin/docker_entrypoint"]
5 changes: 3 additions & 2 deletions bin/necessary-py.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@ RUN apt-get -y update \
COPY . /usr/local/gymnasium/
WORKDIR /usr/local/gymnasium/

RUN pip install --upgrade "numpy>=1.21,<2.0"
RUN pip install .[testing] --no-cache-dir
RUN pip install uv
RUN uv pip install --system --upgrade "numpy>=1.21,<2.0"
RUN uv pip install --system .[testing] --no-cache-dir

ENTRYPOINT ["/usr/local/gymnasium/bin/docker_entrypoint"]
2 changes: 1 addition & 1 deletion docs/api/wrappers/table.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ wrapper in the page on the wrapper type
* - :class:`NumpyToTorch`
- Wraps a NumPy-based environment such that it can be interacted with PyTorch Tensors.
* - :class:`OrderEnforcing`
- Will produce an error if ``step`` or ``render`` is called before ``render``.
- Will produce an error if ``step`` or ``render`` is called before ``reset``.
* - :class:`PassiveEnvChecker`
- A passive environment checker wrapper that surrounds the ``step``, ``reset`` and ``render`` functions to check they follows gymnasium's API.
* - :class:`RecordEpisodeStatistics`
Expand Down
3 changes: 2 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
import os
import re
import sys
import time

import sphinx_gallery.gen_rst
from furo.gen_tutorials import generate_tutorials
Expand All @@ -27,7 +28,7 @@


project = "Gymnasium"
copyright = "2023 Farama Foundation"
copyright = f"{time.localtime().tm_year} Farama Foundation"
author = "Farama Foundation"

# The full version, including alpha/beta/rc tags
Expand Down
11 changes: 6 additions & 5 deletions docs/environments/mujoco.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ env = gymnasium.make("Ant-v5", render_mode="rgb_array", width=1280, height=720)

| Parameter | Type | Default | Description |
|-------------------------|-------------------------------------|---------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `render_mode` | **str** | `None` | The modality of the render result. Must be one of `human`, `rgb_array`, `depth_array`, or `rgbd_tuple`. Note that `human` does not return a rendered image, but renders directly to the window |
| `width` | **int** | `480` | The width of the render window |
| `height` | **int** | `480` | The height of the render window |
| `camera_id` | **int \| None** | `None` | The camera ID used for the render window |
Expand All @@ -117,11 +118,11 @@ env = gymnasium.make("Ant-v5", render_mode="rgb_array", width=1280, height=720)
### Rendering Backend
The MuJoCo simulator renders images with OpenGL and can use 3 different back ends "glfw" (default), "egl", "omesa", which can be selected by setting an [environment variable](https://en.wikipedia.org/wiki/Environment_variable).

| Backend | Environment Variable | Description |
|---------|----------------------------|-----------------------------------|
| `glfw` | `MUJOCO_GL=glfw` (default) | Renders with window System on GPU |
| `egl` | `MUJOCO_GL=egl` | Renders headless on GPU |
| `omesa` | `MUJOCO_GL=omesa` | Renders headless on CPU |
| Backend | Environment Variable | Description |
|----------|----------------------------|-----------------------------------|
| `GLFW` | `MUJOCO_GL=glfw` (default) | Renders with Window System on GPU |
| `EGL` | `MUJOCO_GL=egl` | Renders headless on GPU |
| `OSMESA` | `MUJOCO_GL=osmesa` | Renders headless on CPU |

More information of the [MuJoCo/OpenGL documentation](https://mujoco.readthedocs.io/en/stable/programming/index.html#using-opengl).
<!--
Expand Down
26 changes: 24 additions & 2 deletions docs/environments/third_party_environments.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,17 +25,25 @@ goal-RL ([Gymnasium-Robotics](https://robotics.farama.org/)).
## Third-party environments with Gymnasium
*This page contains environments which are not maintained by Farama Foundation and, as such, cannot be guaranteed to function as intended.*

*If you'd like to contribute an environment, please reach out on [Discord](https://discord.gg/MHCFauP67z), then submit a PR by editing this [file](https://github.com/Farama-Foundation/Gymnasium/blob/main/docs/environments/third_party_environments.md).*
*If you'd like to contribute an environment, please reach out on [Discord](https://discord.gg/MHCFauP67z), then submit a PR by editing this [file](https://github.com/Farama-Foundation/Gymnasium/blob/main/docs/environments/third_party_environments.md), additional instructions can be found inside that file*

<!-- Template
- [NAME: SUB_NAME_IF_PRESENT](LINK)
![Gymnasium version dependency](ADD YOUR BADGE HERE)
![GitHub stars](ADD YOUR BADGE HERE OPTIONAL)
A short 2 sentence description.
A short 2-5 sentence description.
-->

<!-- Instructions
- Follow the template in the file
- Environments and environment categories are alphabetically sorted
- You are responsible for picking the environment category, if you would like to add a category please ask
- Name your PR something like "Add external environment X"
-->



### Autonomous Driving environments
*Autonomous Vehicle and traffic management.*
Expand Down Expand Up @@ -118,6 +126,13 @@ goal-RL ([Gymnasium-Robotics](https://robotics.farama.org/)).

A simple environment for single-agent reinforcement learning algorithms on a clone of [Flappy Bird](https://en.wikipedia.org/wiki/Flappy_Bird), the hugely popular arcade-style mobile game. Both state and pixel observation environments are available.

- [Generals.io bots: Develop your agent for generals.io!](https://github.com/strakam/generals-bots)

![Gymnasium version dependency](https://img.shields.io/badge/Gymnasium-v1.0.0-blue)
![GitHub stars](https://img.shields.io/github/stars/strakam/generals-bots)

Generals.io is a fast-paced strategy game on a 2D grid. We make bot development accessible via the Gymnasium/PettingZoo API.

- [pystk2-gymnasium: SuperTuxKart races gymnasium wrapper](https://github.com/bpiwowar/pystk2-gymnasium)

![Gymnasium version dependency](https://img.shields.io/badge/Gymnasium-v0.29.1-blue)
Expand Down Expand Up @@ -204,6 +219,13 @@ goal-RL ([Gymnasium-Robotics](https://robotics.farama.org/)).

A simple environment using [PyBullet](https://github.com/bulletphysics/bullet3) to simulate the dynamics of a [Bitcraze Crazyflie 2.x](https://www.bitcraze.io/documentation/hardware/crazyflie_2_1/crazyflie_2_1-datasheet.pdf) nanoquadrotor.

- [Itomori: UAV Risk-aware Flight Environment](https://github.com/gustavo-moura/itomori)

![Gymnasium version dependency](https://img.shields.io/badge/Gymnasium-v0.29.1-blue)
![GitHub stars](https://img.shields.io/github/stars/gustavo-moura/itomori)

Itomori is an environment for risk-aware UAV flight, it provides tools to solve Chance-Constrained Markov Decision Processes (CCMDP). The env allows to simulate, visualize, and evaluate UAV navigation in complex and risky environments, incorporating variables like GPS uncertainty, collision risk, and adaptive flight planning. Itomori is intended to support UAV path-planning research by offering adjustable parameters, detailed visualizations, and insights into agent behavior in uncertain environments.

- [OmniIsaacGymEnvs: Gym environments for NVIDIA Omniverse Isaac ](https://github.com/NVIDIA-Omniverse/OmniIsaacGymEnvs/)

Reinforcement Learning Environments for [Omniverse Isaac simulator](https://docs.omniverse.nvidia.com/app_isaacsim/app_isaacsim/overview.html).
Expand Down
2 changes: 1 addition & 1 deletion docs/introduction/basic_usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ In reinforcement learning, the classic "agent-environment loop" pictured below i
:class: only-dark
```

For Gymnasium, the "agent-environment-loop" is implemented below for a single episode (until the environment ends). See the next section for a line-by-line explanation. Note that running this code requires installing swig (`pip install swig` or [download](https://www.swig.org/download.html)) along with `pip install gymnasium[box2d]`.
For Gymnasium, the "agent-environment-loop" is implemented below for a single episode (until the environment ends). See the next section for a line-by-line explanation. Note that running this code requires installing swig (`pip install swig` or [download](https://www.swig.org/download.html)) along with `pip install "gymnasium[box2d]"`.

```python
import gymnasium as gym
Expand Down
6 changes: 3 additions & 3 deletions docs/introduction/create_custom_env.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ Oftentimes, info will also contain some data that is only available inside the :
```{eval-rst}
.. py:currentmodule:: gymnasium.Env
As the purpose of :meth:`reset` is to initiate a new episode for an environment and has two parameters: ``seed`` and ``options``. The seed can be used to initialize the random number generator to a deterministic state and options can be used to specify values used within reset. On the first line of the reset, you need to call ``super().reset(seed=seed)`` which will initialize the random number generate (:attr:`np_random`) to use through the rest of the :meth:`reset`.
The purpose of :meth:`reset` is to initiate a new episode for an environment and has two parameters: ``seed`` and ``options``. The seed can be used to initialize the random number generator to a deterministic state and options can be used to specify values used within reset. On the first line of the reset, you need to call ``super().reset(seed=seed)`` which will initialize the random number generate (:attr:`np_random`) to use through the rest of the :meth:`reset`.
Within our custom environment, the :meth:`reset` needs to randomly choose the agent and target's positions (we repeat this if they have the same position). The return type of :meth:`reset` is a tuple of the initial observation and any auxiliary information. Therefore, we can use the methods ``_get_obs`` and ``_get_info`` that we implemented earlier for that:
```
Expand Down Expand Up @@ -144,9 +144,9 @@ The :meth:`step` method usually contains most of the logic for your environment,
For our environment, several things need to happen during the step function:
- We use the self._action_to_direction to convert the discrete action (e.g., 2) to a grid direction with our agent location. To prevent the agent from going out of bounds of the grd, we clip the agen't location to stay within bounds.
- We use the self._action_to_direction to convert the discrete action (e.g., 2) to a grid direction with our agent location. To prevent the agent from going out of bounds of the grid, we clip the agent's location to stay within bounds.
- We compute the agent's reward by checking if the agent's current position is equal to the target's location.
- Since the environment doesn't truncate internally (we can apply a time limit wrapper to the environment during :meth:make), we permanently set truncated to False.
- Since the environment doesn't truncate internally (we can apply a time limit wrapper to the environment during :meth:`make`), we permanently set truncated to False.
- We once again use _get_obs and _get_info to obtain the agent's observation and auxiliary information.
```

Expand Down
4 changes: 2 additions & 2 deletions docs/introduction/record_agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ title: Recording Agents
During training or when evaluating an agent, it may be interesting to record agent behaviour over an episode and log the total reward accumulated. This can be achieved through two wrappers: :class:`RecordEpisodeStatistics` and :class:`RecordVideo`, the first tracks episode data such as the total rewards, episode length and time taken and the second generates mp4 videos of the agents using the environment renderings.
We show how to apply these wrappers for two types of problems; the first for recording data for every episode (normally evaluation) and second for recording data periodiclly (for normal training).
We show how to apply these wrappers for two types of problems; the first for recording data for every episode (normally evaluation) and second for recording data periodically (for normal training).
```

## Recording Every Episode
Expand Down Expand Up @@ -55,7 +55,7 @@ In the script above, for the :class:`RecordVideo` wrapper, we specify three diff
For the :class:`RecordEpisodicStatistics`, we only need to specify the buffer lengths, this is the max length of the internal ``time_queue``, ``return_queue`` and ``length_queue``. Rather than collect the data for each episode individually, we can use the data queues to print the information at the end of the evaluation.
For speed ups in evaluating environments, it is possible to implement this with vector environments to in order to evaluate ``N`` episodes at the same time in parallel rather than series.
For speed ups in evaluating environments, it is possible to implement this with vector environments in order to evaluate ``N`` episodes at the same time in parallel rather than series.
```

## Recording the Agent during Training
Expand Down
Loading

0 comments on commit 606bfaf

Please sign in to comment.