Skip to content

Commit

Permalink
Second code review by Jet
Browse files Browse the repository at this point in the history
  • Loading branch information
pseudo-rnd-thoughts committed Dec 7, 2023
1 parent c1543ff commit f4338e1
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 9 deletions.
2 changes: 1 addition & 1 deletion docs/introduction/create_custom_env.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ For our environment, several things need to happen during the step function:
```{eval-rst}
While it is possible to use your new custom environment now immediately, it is more common for environments to be initialized using :meth:`gymnasium.make`. In this section, we explain how to register a custom environment then initialize it.
The environment ID consists of three components, two of which are optional: an optional namespace (here: ``gymnasium_env``), a mandatory name (here: ``GridWorld``) and an optional but recommended version (here: v0). It might have also been registered as ``GridWorld-v0`` (the recommended approach), ``GridWorld`` or ``gymnasium_env/GridWorld``, and the appropriate ID should then be used during environment creation.
The environment ID consists of three components, two of which are optional: an optional namespace (here: ``gymnasium_env``), a mandatory name (here: ``GridWorld``) and an optional but recommended version (here: v0). It may have also be registered as ``GridWorld-v0`` (the recommended approach), ``GridWorld`` or ``gymnasium_env/GridWorld``, and the appropriate ID should then be used during environment creation.
The entry point can be a string or function, as this tutorial isn't part of a python project, we cannot use a string but for most environments, this is the normal way of specifying the entry point.
Expand Down
12 changes: 6 additions & 6 deletions docs/introduction/record_agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,17 @@ title: Recording Agents
```{eval-rst}
.. py:currentmodule: gymnasium.wrappers
During training or evaluating an agent, developers often record how the agent acts over an episode, in particular, video the agent behaviour and log the total reward for each episode. Gymnasium simplifies this process with two wrappers: :class:`RecordEpisodeStatistics` and :class:`RecordVideo`, the first tracks episode data such as the total rewards, episode length and time taken and the second recording environment renderings as mp4 videos.
During training or when evaluating an agent, it may be interesting to record agent behaviour over an episode and log the total reward accumulated. This can be achieved through two wrappers: :class:`RecordEpisodeStatistics` and :class:`RecordVideo`, the first tracks episode data such as the total rewards, episode length and time taken and the second generates mp4 videos of the agents using the environment renderings.
We consider how to apply these wrappers for two types of problems; the first for recording data for every episode (normally evaluation) and second for recording data periodiclly (for training).
We show how to apply these wrappers for two types of problems; the first for recording data for every episode (normally evaluation) and second for recording data periodiclly (for normal training).
```

## Recording Every Episode

```{eval-rst}
.. py:currentmodule: gymnasium.wrappers
When evaluating a trained agent, developers normally wish to record several episode to see how the agent acted. Below we provide an example script to do this with the :class:`RecordEpisodeStatistics` and :class:`RecordVideo`.
Given a trained agent, you may wish to record several episodes during evaluation to see how the agent acts. Below we provide an example script to do this with the :class:`RecordEpisodeStatistics` and :class:`RecordVideo`.
```

```python
Expand Down Expand Up @@ -51,16 +51,16 @@ print(f'Episode lengths: {env.length_queue}')
```{eval-rst}
.. py:currentmodule: gymnasium.wrappers
In the script above, for the :class:`RecordVideo` wrapper, we specify three different variables ``video_folder`` to specify the folder that the videos should be saved (change for your problem), ``name_prefix`` for the prefix of videos themselves and finally an ``episode_trigger`` such that every episode is recorded. This means that for every episodic of the environment, a video will be recorded and saved in the style "cartpole-agent/eval-episode-x.mp4".
In the script above, for the :class:`RecordVideo` wrapper, we specify three different variables: ``video_folder`` to specify the folder that the videos should be saved (change for your problem), ``name_prefix`` for the prefix of videos themselves and finally an ``episode_trigger`` such that every episode is recorded. This means that for every episode of the environment, a video will be recorded and saved in the style "cartpole-agent/eval-episode-x.mp4".
For the :class:`RecordEpisodicStatistics`, we only need to specify the buffer lengths, this is the max length of the internal ``time_queue``, ``return_queue`` and ``length_queue``. Rather than collect the data for each episode individually, we can use the data queues can print the information at the end of the evaluation.
For the :class:`RecordEpisodicStatistics`, we only need to specify the buffer lengths, this is the max length of the internal ``time_queue``, ``return_queue`` and ``length_queue``. Rather than collect the data for each episode individually, we can use the data queues to print the information at the end of the evaluation.
For speed ups evaluating environments, it is possible to implement this with vector environments to in order to evaluate ``N`` episodes at the same time in parallel rather than series.
```

## Recording the Agent during Training

During training, an agent will act in hundreds or thousands of episodes, therefore, we can't record a video for each episode, but developers still might want to know how the agent acts at different points in the training, recording episodes periodically during training. While for the episode statistics, it is more helpful to know this data for every episode. The following script provides an example of how to periodically record episodes of an agent while recording every episode's statistics (we use the python's logger but [tensorboard](https://www.tensorflow.org/tensorboard), [wandb](https://docs.wandb.ai/guides/track) and other modules are available).
During training, an agent will act in hundreds or thousands of episodes, therefore, you can't record a video for each episode, but developers might still want to know how the agent acts at different points in the training, recording episodes periodically during training. While for the episode statistics, it is more helpful to know this data for every episode. The following script provides an example of how to periodically record episodes of an agent while recording every episode's statistics (we use the python's logger but [tensorboard](https://www.tensorflow.org/tensorboard), [wandb](https://docs.wandb.ai/guides/track) and other modules are available).

```python
import logging
Expand Down
4 changes: 2 additions & 2 deletions docs/introduction/train_agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ This page provides a short outline of how to train an agent for a gymnasium envi

Blackjack is one of the most popular casino card games that is also infamous for being beatable under certain conditions. This version of the game uses an infinite deck (we draw the cards with replacement), so counting cards won't be a viable strategy in our simulated game. The observation is a tuple of the player's current sum, the value of the dealers face-up card and a boolean value on whether the player holds a usable case. The agent can pick between two actions: stand (0) such that the player takes no more cards and hit (1) such that the player will take another player. To win, your card sum should be greater than the dealers without exceeding 21. The game ends if the player selects stand or if the card sum is greater than 21. Full documentation can be found at [https://gymnasium.farama.org/environments/toy_text/blackjack](https://gymnasium.farama.org/environments/toy_text/blackjack).

Q-learning is a model-free off-policy training algorithm by Watkins, 1989 for environments with discrete action spaces and was famous for being the first reinforcement learning algorithm to prove convergence to the optimal policy under certain conditions.
Q-learning is a model-free off-policy learning algorithm by Watkins, 1989 for environments with discrete action spaces and was famous for being the first reinforcement learning algorithm to prove convergence to an optimal policy under certain conditions.

## Executing an action

Expand Down Expand Up @@ -162,4 +162,4 @@ Hopefully this Tutorial helped you get a grip of how to interact with Gymnasium

It is recommended that you solve this environment by yourself (project based learning is really effective!). You can apply your favorite discrete RL algorithm or give Monte Carlo ES a try (covered in `Sutton & Barto <http://incompleteideas.net/book/the-book-2nd.html>`_, section 5.3) - this way you can compare your results directly to the book.

Best of fun!
Best of luck!

0 comments on commit f4338e1

Please sign in to comment.