Skip to content

Commit

Permalink
Merge branch 'develop' into toni/call_step_preprocessor_once
Browse files Browse the repository at this point in the history
  • Loading branch information
Toni-SM committed Jan 18, 2025
2 parents c9993a9 + d57c8ea commit e872aa2
Show file tree
Hide file tree
Showing 309 changed files with 25,810 additions and 8,344 deletions.
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ body:
description: The skrl version can be obtained with the command `pip show skrl`.
options:
- ---
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/python-publish-manual.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:

pypi:
name: Publish package to PyPI
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
if: ${{ github.event.inputs.job == 'pypi'}}

steps:
Expand All @@ -24,7 +24,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: '3.7'
python-version: '3.10.16'

- name: Install dependencies
run: |
Expand All @@ -43,7 +43,7 @@ jobs:

test-pypi:
name: Publish package to TestPyPI
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
if: ${{ github.event.inputs.job == 'test-pypi'}}

steps:
Expand All @@ -52,7 +52,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: '3.7'
python-version: '3.10.16'

- name: Install dependencies
run: |
Expand Down
48 changes: 37 additions & 11 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,15 +1,41 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-ast
- id: check-case-conflict
- id: check-docstring-first
- id: check-merge-conflict
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
rev: v4.6.0
hooks:
- id: check-ast
- id: check-case-conflict
- id: check-docstring-first
- id: check-json
- id: check-merge-conflict
- id: check-toml
- id: check-yaml
- id: debug-statements
- id: detect-private-key
- id: end-of-file-fixer
- id: name-tests-test
args: ["--pytest-test-first"]
exclude: ^(tests/strategies.py|tests/utils.py)
- id: trailing-whitespace
- repo: https://github.com/codespell-project/codespell
rev: v2.3.0
hooks:
- id: codespell
exclude: ^(docs/source/_static|docs/_build|pyproject.toml)
additional_dependencies:
- tomli
- repo: https://github.com/python/black
rev: 24.8.0
hooks:
- id: black
args: ["--line-length=120"]
exclude: ^(docs/)
- repo: https://github.com/pycqa/isort
rev: 5.12.0
rev: 5.13.2
hooks:
- id: isort
- repo: https://github.com/pre-commit/pygrep-hooks
rev: v1.10.0
hooks:
- id: isort
- id: rst-backticks
- id: rst-directive-colons
- id: rst-inline-touching-normal
89 changes: 76 additions & 13 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,75 @@

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [1.3.0] - Unreleased
## [1.4.0] - 2025-01-16
### Added
- Utilities to operate on Gymnasium spaces (`Box`, `Discrete`, `MultiDiscrete`, `Tuple` and `Dict`)
- `parse_device` static method in ML framework configuration (used in library components to set up the device)
- Model instantiator support for different shared model structures in PyTorch
- Support for automatic mixed precision training in PyTorch
- `init_state_dict` method to initialize model's lazy modules in PyTorch
- Model instantiators `fixed_log_std` parameter to define immutable log standard deviations
- Define the `stochastic_evaluation` trainer config to allow the use of the actions returned by the agent's model
as-is instead of deterministic actions (mean-actions in Gaussian-based models) during evaluation.
Make the return of deterministic actions the default behavior.

### Changed
- Call agent's `pre_interaction` method during evaluation
- Use spaces utilities to process states, observations and actions for all the library components
- Update model instantiators definitions to process supported fundamental and composite Gymnasium spaces
- Make flattened tensor storage in memory the default option (revert changed introduced in version 1.3.0)
- Drop support for PyTorch versions prior to 1.10 (the previous supported version was 1.9)
- Update KL Adaptive learning rate scheduler implementation to match Optax's behavior in JAX
- Update AMP agent to use the environment's terminated and truncated data, and the KL Adaptive learning rate scheduler
- Update runner implementations to support definition of arbitrary agents and their models
- Speed up PyTorch implementation:
- Disable argument checking when instantiating distributions
- Replace PyTorch's `BatchSampler` by Python slice when sampling data from memory

### Changed (breaking changes: style)
- Format code using Black code formatter (it's ugly, yes, but it does its job)

### Fixed
- Move the batch sampling inside gradient step loop for DQN, DDQN, DDPG (RNN), TD3 (RNN), SAC and SAC (RNN)
- Model state dictionary initialization for composite Gymnasium spaces in JAX
- Add missing `reduction` parameter to Gaussian model instantiator
- Optax's learning rate schedulers integration in JAX implementation
- Isaac Lab wrapper's multi-agent state retrieval with gymnasium 1.0
- Treat truncation signal when computing 'done' (environment reset)

### Removed
- Remove OpenAI Gym (`gym`) from dependencies and source code. **skrl** continues to support gym environments,
it is just not installed as part of the library. If it is needed, it needs to be installed manually.
Any gym-based environment wrapper must use the `convert_gym_space` space utility to operate

## [1.3.0] - 2024-09-11
### Added
- Distributed multi-GPU and multi-node learning (JAX implementation)
- Utilities to start multiple processes from a single program invocation for distributed learning using JAX
- Model instantiators `return_source` parameter to get the source class definition used to instantiate the models
- `Runner` utility to run training/evaluation workflows in a few lines of code
- Wrapper for Isaac Lab multi-agent environments
- Wrapper for Google Brax environments

### Changed
- Move the KL reduction from the PyTorch `KLAdaptiveLR` class to each agent using it in distributed runs
- Move the KL reduction from the PyTorch `KLAdaptiveLR` class to each agent that uses it in distributed runs
- Move the PyTorch distributed initialization from the agent base class to the ML framework configuration
- Upgrade model instantiator implementations to support CNN layers and complex network definitions,
and implement them using dynamic execution of Python code
- Update Isaac Lab environment loader argument parser options to match Isaac Lab version
- Allow to store tensors/arrays with their original dimensions in memory and make it the default option

### Changed (breaking changes)
- Decouple the observation and state spaces in single and multi-agent environment wrappers and add the `state`
method to get the state of the environment
- Simplify multi-agent environment wrapper API by removing shared space properties and methods

### Fixed
- Catch TensorBoard summary iterator exceptions in `TensorboardFileIterator` postprocessing utils
- Fix automatic wrapper detection issue (introduced in previous version) for Isaac Gym (previews),
DeepMind and vectorized Gymnasium environments
- Fix vectorized/parallel environments `reset` method return values when called more than once
- Fix IPPO and MAPPO `act` method return values when JAX-NumPy backend is enabled

## [1.2.0] - 2024-06-23
### Added
Expand Down Expand Up @@ -39,7 +100,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

Transition from pre-release versions (`1.0.0-rc.1` and`1.0.0-rc.2`) to a stable version.

This release also announces the publication of the **skrl** paper in the Journal of Machine Learning Research (JMLR): https://www.jmlr.org/papers/v24/23-0112.html
This release also announces the publication of the **skrl** paper in the Journal of
Machine Learning Research (JMLR): https://www.jmlr.org/papers/v24/23-0112.html

Summary of the most relevant features:
- JAX support
Expand All @@ -49,11 +111,11 @@ Summary of the most relevant features:
## [1.0.0-rc.2] - 2023-08-11
### Added
- Get truncation from `time_outs` info in Isaac Gym, Isaac Orbit and Omniverse Isaac Gym environments
- Time-limit (truncation) boostrapping in on-policy actor-critic agents
- Time-limit (truncation) bootstrapping in on-policy actor-critic agents
- Model instantiators `initial_log_std` parameter to set the log standard deviation's initial value

### Changed (breaking changes)
- Structure environment loaders and wrappers file hierarchy coherently
- Structure environment loaders and wrappers file hierarchy coherently.
Import statements now follow the next convention:
- Wrappers (e.g.):
- `from skrl.envs.wrappers.torch import wrap_env`
Expand All @@ -63,7 +125,7 @@ Summary of the most relevant features:
- `from skrl.envs.loaders.jax import load_omniverse_isaacgym_env`

### Changed
- Drop support for versions prior to PyTorch 1.9 (1.8.0 and 1.8.1)
- Drop support for PyTorch versions prior to 1.9 (the previous supported version was 1.8)

## [1.0.0-rc.1] - 2023-07-25
### Added
Expand All @@ -72,9 +134,10 @@ Summary of the most relevant features:
- IPPO and MAPPO multi-agent
- Multi-agent base class
- Bi-DexHands environment loader
- Wrapper for PettingZoo and Bi-DexHands environments
- Wrapper for Bi-DexHands environments
- Wrapper for PettingZoo environments
- Parameters `num_envs`, `headless` and `cli_args` for configuring Isaac Gym, Isaac Orbit
and Omniverse Isaac Gym environments when they are loaded
and Omniverse Isaac Gym environments when they are loaded

### Changed
- Migrate to `pyproject.toml` Python package development
Expand All @@ -89,7 +152,7 @@ and Omniverse Isaac Gym environments when they are loaded
- Disable PyTorch gradient computation during the environment stepping
- Get categorical models' entropy
- Typo in `KLAdaptiveLR` learning rate scheduler
(keep the old name for compatibility with the examples of previous versions.
(Keep the old name for compatibility with the examples of previous versions.
The old name will be removed in future releases)

## [0.10.2] - 2023-03-23
Expand All @@ -99,7 +162,7 @@ and Omniverse Isaac Gym environments when they are loaded

## [0.10.1] - 2023-01-26
### Fixed
- Tensorboard writer instantiation when `write_interval` is zero
- TensorBoard writer instantiation when `write_interval` is zero

## [0.10.0] - 2023-01-22
### Added
Expand Down Expand Up @@ -155,7 +218,7 @@ to allow storing samples in memories during evaluation
- Parameter `role` to model methods
- Wrapper compatibility with the new OpenAI Gym environment API
- Internal library colored logger
- Migrate checkpoints/models from other RL libraries to skrl models/agents
- Migrate checkpoints/models from other RL libraries to **skrl** models/agents
- Configuration parameter `store_separately` to agent configuration dict
- Save/load agent modules (models, optimizers, preprocessors)
- Set random seed and configure deterministic behavior for reproducibility
Expand Down Expand Up @@ -198,7 +261,7 @@ to allow storing samples in memories during evaluation
## [0.5.0] - 2022-05-18
### Added
- TRPO agent
- DeepMind environment wrapper
- Wrapper for DeepMind environments
- KL Adaptive learning rate scheduler
- Handle `gym.spaces.Dict` observation spaces (OpenAI Gym and DeepMind environments)
- Forward environment info to agent `record_transition` method
Expand All @@ -218,7 +281,7 @@ to allow storing samples in memories during evaluation
## [0.4.1] - 2022-03-22
### Added
- Examples of all Isaac Gym environments (preview 3)
- Tensorboard file iterator for data post-processing
- TensorBoard file iterator for data post-processing

### Fixed
- Init and evaluate agents in ParallelTrainer
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Read the code a little bit and you will understand it at first glance... Also
```ini
function annotation (e.g. typing)
# insert an empty line
python libraries and other libraries (e.g. gym, numpy, time, etc.)
python libraries and other libraries (e.g. gymnasium, numpy, time, etc.)
# insert an empty line
machine learning framework modules (e.g. torch, torch.nn)
# insert an empty line
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
<h2 align="center" style="border-bottom: 0 !important;">SKRL - Reinforcement Learning library</h2>
<br>

**skrl** is an open-source modular library for Reinforcement Learning written in Python (on top of [PyTorch](https://pytorch.org/) and [JAX](https://jax.readthedocs.io)) and designed with a focus on modularity, readability, simplicity, and transparency of algorithm implementation. In addition to supporting the OpenAI [Gym](https://www.gymlibrary.dev) / Farama [Gymnasium](https://gymnasium.farama.org) and [DeepMind](https://github.com/deepmind/dm_env) and other environment interfaces, it allows loading and configuring [NVIDIA Isaac Gym](https://developer.nvidia.com/isaac-gym/), [NVIDIA Omniverse Isaac Gym](https://docs.omniverse.nvidia.com/isaacsim/latest/tutorial_gym_isaac_gym.html) and [NVIDIA Isaac Lab](https://isaac-sim.github.io/IsaacLab/index.html) environments, enabling agents' simultaneous training by scopes (subsets of environments among all available environments), which may or may not share resources, in the same run.
**skrl** is an open-source modular library for Reinforcement Learning written in Python (on top of [PyTorch](https://pytorch.org/) and [JAX](https://jax.readthedocs.io)) and designed with a focus on modularity, readability, simplicity, and transparency of algorithm implementation. In addition to supporting the OpenAI [Gym](https://www.gymlibrary.dev), Farama [Gymnasium](https://gymnasium.farama.org) and [PettingZoo](https://pettingzoo.farama.org), Google [DeepMind](https://github.com/deepmind/dm_env) and [Brax](https://github.com/google/brax), among other environment interfaces, it allows loading and configuring NVIDIA [Isaac Lab](https://isaac-sim.github.io/IsaacLab/index.html) (as well as [Isaac Gym](https://developer.nvidia.com/isaac-gym/) and [Omniverse Isaac Gym](https://github.com/isaac-sim/OmniIsaacGymEnvs)) environments, enabling agents' simultaneous training by scopes (subsets of environments among all available environments), which may or may not share resources, in the same run.

<br>

Expand Down
3 changes: 2 additions & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
furo==2023.7.26
furo==2024.8.6
sphinx
sphinx-tabs
sphinx-autobuild
sphinx-copybutton
sphinx-notfound-page
decorator
numpy
1,084 changes: 1,083 additions & 1 deletion docs/source/_static/imgs/multi_agent_wrapping-dark.svg
100755 → 100644
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,102 changes: 1,101 additions & 1 deletion docs/source/_static/imgs/multi_agent_wrapping-light.svg
100755 → 100644
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
742 changes: 741 additions & 1 deletion docs/source/_static/imgs/wrapping-dark.svg
100755 → 100644
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
754 changes: 753 additions & 1 deletion docs/source/_static/imgs/wrapping-light.svg
100755 → 100644
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 0 additions & 2 deletions docs/source/api/agents.rst
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,6 @@ API (PyTorch)
:private-members: _update, _empty_preprocessor, _get_internal_value
:members:

.. automethod:: __init__
.. automethod:: __str__

.. raw:: html
Expand All @@ -136,5 +135,4 @@ API (JAX)
:private-members: _update, _empty_preprocessor, _get_internal_value
:members:

.. automethod:: __init__
.. automethod:: __str__
14 changes: 6 additions & 8 deletions docs/source/api/agents/a2c.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Algorithm implementation

| Main notation/symbols:
| - policy function approximator (:math:`\pi_\theta`), value function approximator (:math:`V_\phi`)
| - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), dones (:math:`d`)
| - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), terminated (:math:`d_{_{end}}`), truncated (:math:`d_{_{timeout}}`)
| - values (:math:`V`), advantages (:math:`A`), returns (:math:`R`)
| - log probabilities (:math:`logp`)
| - loss (:math:`L`)
Expand Down Expand Up @@ -59,7 +59,7 @@ Learning algorithm
| :literal:`_update(...)`
| :green:`# compute returns and advantages`
| :math:`V_{_{last}}' \leftarrow V_\phi(s')`
| :math:`R, A \leftarrow f_{GAE}(r, d, V, V_{_{last}}')`
| :math:`R, A \leftarrow f_{GAE}(r, d_{_{end}} \lor d_{_{timeout}}, V, V_{_{last}}')`
| :green:`# sample mini-batches from memory`
| [[:math:`s, a, logp, V, R, A`]] :math:`\leftarrow` states, actions, log_prob, values, returns, advantages
| :green:`# mini-batches loop`
Expand Down Expand Up @@ -232,6 +232,10 @@ Support for advanced features is described in the next table
- RNN, LSTM, GRU and any other variant
- .. centered:: :math:`\blacksquare`
- .. centered:: :math:`\square`
* - Mixed precision
- Automatic mixed precision
- .. centered:: :math:`\blacksquare`
- .. centered:: :math:`\square`
* - Distributed
- Single Program Multi Data (SPMD) multi-GPU
- .. centered:: :math:`\blacksquare`
Expand All @@ -252,16 +256,12 @@ API (PyTorch)
:private-members: _update
:members:

.. automethod:: __init__

.. autoclass:: skrl.agents.torch.a2c.A2C_RNN
:undoc-members:
:show-inheritance:
:private-members: _update
:members:

.. automethod:: __init__

.. raw:: html

<br>
Expand All @@ -276,5 +276,3 @@ API (JAX)
:show-inheritance:
:private-members: _update
:members:

.. automethod:: __init__
10 changes: 6 additions & 4 deletions docs/source/api/agents/amp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Algorithm implementation

| Main notation/symbols:
| - policy (:math:`\pi_\theta`), value (:math:`V_\phi`) and discriminator (:math:`D_\psi`) function approximators
| - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), dones (:math:`d`)
| - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), terminated (:math:`d_{_{end}}`), truncated (:math:`d_{_{timeout}}`)
| - values (:math:`V`), next values (:math:`V'`), advantages (:math:`A`), returns (:math:`R`)
| - log probabilities (:math:`logp`)
| - loss (:math:`L`)
Expand Down Expand Up @@ -57,7 +57,7 @@ Learning algorithm
| :math:`r_D \leftarrow -log(\text{max}( 1 - \hat{y}(D_\psi(s_{_{AMP}})), \, 10^{-4})) \qquad` with :math:`\; \hat{y}(x) = \dfrac{1}{1 + e^{-x}}`
| :math:`r' \leftarrow` :guilabel:`task_reward_weight` :math:`r \, +` :guilabel:`style_reward_weight` :guilabel:`discriminator_reward_scale` :math:`r_D`
| :green:`# compute returns and advantages`
| :math:`R, A \leftarrow f_{GAE}(r', d, V, V')`
| :math:`R, A \leftarrow f_{GAE}(r', d_{_{end}} \lor d_{_{timeout}}, V, V')`
| :green:`# sample mini-batches from memory`
| [[:math:`s, a, logp, V, R, A, s_{_{AMP}}`]] :math:`\leftarrow` states, actions, log_prob, values, returns, advantages, AMP states
| [[:math:`s_{_{AMP}}^{^M}`]] :math:`\leftarrow` AMP states from :math:`M`
Expand Down Expand Up @@ -237,6 +237,10 @@ Support for advanced features is described in the next table
- \-
- .. centered:: :math:`\square`
- .. centered:: :math:`\square`
* - Mixed precision
- Automatic mixed precision
- .. centered:: :math:`\blacksquare`
- .. centered:: :math:`\square`
* - Distributed
- Single Program Multi Data (SPMD) multi-GPU
- .. centered:: :math:`\blacksquare`
Expand All @@ -256,5 +260,3 @@ API (PyTorch)
:show-inheritance:
:private-members: _update
:members:

.. automethod:: __init__
Loading

0 comments on commit e872aa2

Please sign in to comment.