Merge branch 'develop' into toni/call_step_preprocessor_once

Toni-SM · Jan 18, 2025 · e872aa2 · e872aa2
2 parents c9993a9 + d57c8ea
commit e872aa2
Show file tree

Hide file tree

Showing 309 changed files with 25,810 additions and 8,344 deletions.
diff --git a/.github/ISSUE_TEMPLATE/bug_report.yaml b/.github/ISSUE_TEMPLATE/bug_report.yaml
@@ -30,6 +30,8 @@ body:
     description: The skrl version can be obtained with the command `pip show skrl`.
     options:
       - ---
+      - 1.4.0
+      - 1.3.0
       - 1.2.0
       - 1.1.0
       - 1.0.0

diff --git a/.github/workflows/python-publish-manual.yml b/.github/workflows/python-publish-manual.yml
@@ -15,7 +15,7 @@ jobs:
 
   pypi:
     name: Publish package to PyPI
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-22.04
     if: ${{ github.event.inputs.job == 'pypi'}}
 
     steps:
@@ -24,7 +24,7 @@ jobs:
     - name: Set up Python
       uses: actions/setup-python@v3
       with:
-        python-version: '3.7'
+        python-version: '3.10.16'
 
     - name: Install dependencies
       run: |
@@ -43,7 +43,7 @@ jobs:
 
   test-pypi:
     name: Publish package to TestPyPI
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-22.04
     if: ${{ github.event.inputs.job == 'test-pypi'}}
 
     steps:
@@ -52,7 +52,7 @@ jobs:
     - name: Set up Python
       uses: actions/setup-python@v3
       with:
-        python-version: '3.7'
+        python-version: '3.10.16'
 
     - name: Install dependencies
       run: |

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,15 +1,41 @@
 repos:
 - repo: https://github.com/pre-commit/pre-commit-hooks
-  rev: v4.4.0
-  hooks:
-  - id: check-ast
-  - id: check-case-conflict
-  - id: check-docstring-first
-  - id: check-merge-conflict
-  - id: check-yaml
-  - id: end-of-file-fixer
-  - id: trailing-whitespace
+  rev: v4.6.0
+  hooks:
+    - id: check-ast
+    - id: check-case-conflict
+    - id: check-docstring-first
+    - id: check-json
+    - id: check-merge-conflict
+    - id: check-toml
+    - id: check-yaml
+    - id: debug-statements
+    - id: detect-private-key
+    - id: end-of-file-fixer
+    - id: name-tests-test
+      args: ["--pytest-test-first"]
+      exclude: ^(tests/strategies.py|tests/utils.py)
+    - id: trailing-whitespace
+- repo: https://github.com/codespell-project/codespell
+  rev: v2.3.0
+  hooks:
+    - id: codespell
+      exclude: ^(docs/source/_static|docs/_build|pyproject.toml)
+      additional_dependencies:
+        - tomli
+- repo: https://github.com/python/black
+  rev: 24.8.0
+  hooks:
+    - id: black
+      args: ["--line-length=120"]
+      exclude: ^(docs/)
 - repo: https://github.com/pycqa/isort
-  rev: 5.12.0
+  rev: 5.13.2
+  hooks:
+    - id: isort
+- repo: https://github.com/pre-commit/pygrep-hooks
+  rev: v1.10.0
   hooks:
-  - id: isort
+    - id: rst-backticks
+    - id: rst-directive-colons
+    - id: rst-inline-touching-normal
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,14 +2,75 @@
 
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 
-## [1.3.0] - Unreleased
+## [1.4.0] - 2025-01-16
+### Added
+- Utilities to operate on Gymnasium spaces (`Box`, `Discrete`, `MultiDiscrete`, `Tuple` and `Dict`)
+- `parse_device` static method in ML framework configuration (used in library components to set up the device)
+- Model instantiator support for different shared model structures in PyTorch
+- Support for automatic mixed precision training in PyTorch
+- `init_state_dict` method to initialize model's lazy modules in PyTorch
+- Model instantiators `fixed_log_std` parameter to define immutable log standard deviations
+- Define the `stochastic_evaluation` trainer config to allow the use of the actions returned by the agent's model
+  as-is instead of deterministic actions (mean-actions in Gaussian-based models) during evaluation.
+  Make the return of deterministic actions the default behavior.
+
+### Changed
+- Call agent's `pre_interaction` method during evaluation
+- Use spaces utilities to process states, observations and actions for all the library components
+- Update model instantiators definitions to process supported fundamental and composite Gymnasium spaces
+- Make flattened tensor storage in memory the default option (revert changed introduced in version 1.3.0)
+- Drop support for PyTorch versions prior to 1.10 (the previous supported version was 1.9)
+- Update KL Adaptive learning rate scheduler implementation to match Optax's behavior in JAX
+- Update AMP agent to use the environment's terminated and truncated data, and the KL Adaptive learning rate scheduler
+- Update runner implementations to support definition of arbitrary agents and their models
+- Speed up PyTorch implementation:
+  - Disable argument checking when instantiating distributions
+  - Replace PyTorch's `BatchSampler` by Python slice when sampling data from memory
+
+### Changed (breaking changes: style)
+- Format code using Black code formatter (it's ugly, yes, but it does its job)
+
+### Fixed
+- Move the batch sampling inside gradient step loop for DQN, DDQN, DDPG (RNN), TD3 (RNN), SAC and SAC (RNN)
+- Model state dictionary initialization for composite Gymnasium spaces in JAX
+- Add missing `reduction` parameter to Gaussian model instantiator
+- Optax's learning rate schedulers integration in JAX implementation
+- Isaac Lab wrapper's multi-agent state retrieval with gymnasium 1.0
+- Treat truncation signal when computing 'done' (environment reset)
+
+### Removed
+- Remove OpenAI Gym (`gym`) from dependencies and source code. **skrl** continues to support gym environments,
+  it is just not installed as part of the library. If it is needed, it needs to be installed manually.
+  Any gym-based environment wrapper must use the `convert_gym_space` space utility to operate
+
+## [1.3.0] - 2024-09-11
 ### Added
 - Distributed multi-GPU and multi-node learning (JAX implementation)
 - Utilities to start multiple processes from a single program invocation for distributed learning using JAX
+- Model instantiators `return_source` parameter to get the source class definition used to instantiate the models
+- `Runner` utility to run training/evaluation workflows in a few lines of code
+- Wrapper for Isaac Lab multi-agent environments
+- Wrapper for Google Brax environments
 
 ### Changed
-- Move the KL reduction from the PyTorch `KLAdaptiveLR` class to each agent using it in distributed runs
+- Move the KL reduction from the PyTorch `KLAdaptiveLR` class to each agent that uses it in distributed runs
 - Move the PyTorch distributed initialization from the agent base class to the ML framework configuration
+- Upgrade model instantiator implementations to support CNN layers and complex network definitions,
+  and implement them using dynamic execution of Python code
+- Update Isaac Lab environment loader argument parser options to match Isaac Lab version
+- Allow to store tensors/arrays with their original dimensions in memory and make it the default option
+
+### Changed (breaking changes)
+- Decouple the observation and state spaces in single and multi-agent environment wrappers and add the `state`
+  method to get the state of the environment
+- Simplify multi-agent environment wrapper API by removing shared space properties and methods
+
+### Fixed
+- Catch TensorBoard summary iterator exceptions in `TensorboardFileIterator` postprocessing utils
+- Fix automatic wrapper detection issue (introduced in previous version) for Isaac Gym (previews),
+  DeepMind and vectorized Gymnasium environments
+- Fix vectorized/parallel environments `reset` method return values when called more than once
+- Fix IPPO and MAPPO `act` method return values when JAX-NumPy backend is enabled
 
 ## [1.2.0] - 2024-06-23
 ### Added
@@ -39,7 +100,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 
 Transition from pre-release versions (`1.0.0-rc.1` and`1.0.0-rc.2`) to a stable version.
 
-This release also announces the publication of the **skrl** paper in the Journal of Machine Learning Research (JMLR): https://www.jmlr.org/papers/v24/23-0112.html
+This release also announces the publication of the **skrl** paper in the Journal of
+Machine Learning Research (JMLR): https://www.jmlr.org/papers/v24/23-0112.html
 
 Summary of the most relevant features:
 - JAX support
@@ -49,11 +111,11 @@ Summary of the most relevant features:
 ## [1.0.0-rc.2] - 2023-08-11
 ### Added
 - Get truncation from `time_outs` info in Isaac Gym, Isaac Orbit and Omniverse Isaac Gym environments
-- Time-limit (truncation) boostrapping in on-policy actor-critic agents
+- Time-limit (truncation) bootstrapping in on-policy actor-critic agents
 - Model instantiators `initial_log_std` parameter to set the log standard deviation's initial value
 
 ### Changed (breaking changes)
-- Structure environment loaders and wrappers file hierarchy coherently
+- Structure environment loaders and wrappers file hierarchy coherently.
   Import statements now follow the next convention:
   - Wrappers (e.g.):
     - `from skrl.envs.wrappers.torch import wrap_env`
@@ -63,7 +125,7 @@ Summary of the most relevant features:
     - `from skrl.envs.loaders.jax import load_omniverse_isaacgym_env`
 
 ### Changed
-- Drop support for versions prior to PyTorch 1.9 (1.8.0 and 1.8.1)
+- Drop support for PyTorch versions prior to 1.9 (the previous supported version was 1.8)
 
 ## [1.0.0-rc.1] - 2023-07-25
 ### Added
@@ -72,9 +134,10 @@ Summary of the most relevant features:
 - IPPO and MAPPO multi-agent
 - Multi-agent base class
 - Bi-DexHands environment loader
-- Wrapper for PettingZoo and Bi-DexHands environments
+- Wrapper for Bi-DexHands environments
+- Wrapper for PettingZoo environments
 - Parameters `num_envs`, `headless` and `cli_args` for configuring Isaac Gym, Isaac Orbit
-and Omniverse Isaac Gym environments when they are loaded
+  and Omniverse Isaac Gym environments when they are loaded
 
 ### Changed
 - Migrate to `pyproject.toml` Python package development
@@ -89,7 +152,7 @@ and Omniverse Isaac Gym environments when they are loaded
 - Disable PyTorch gradient computation during the environment stepping
 - Get categorical models' entropy
 - Typo in `KLAdaptiveLR` learning rate scheduler
-  (keep the old name for compatibility with the examples of previous versions.
+  (Keep the old name for compatibility with the examples of previous versions.
   The old name will be removed in future releases)
 
 ## [0.10.2] - 2023-03-23
@@ -99,7 +162,7 @@ and Omniverse Isaac Gym environments when they are loaded
 
 ## [0.10.1] - 2023-01-26
 ### Fixed
-- Tensorboard writer instantiation when `write_interval` is zero
+- TensorBoard writer instantiation when `write_interval` is zero
 
 ## [0.10.0] - 2023-01-22
 ### Added
@@ -155,7 +218,7 @@ to allow storing samples in memories during evaluation
 - Parameter `role` to model methods
 - Wrapper compatibility with the new OpenAI Gym environment API
 - Internal library colored logger
-- Migrate checkpoints/models from other RL libraries to skrl models/agents
+- Migrate checkpoints/models from other RL libraries to **skrl** models/agents
 - Configuration parameter `store_separately` to agent configuration dict
 - Save/load agent modules (models, optimizers, preprocessors)
 - Set random seed and configure deterministic behavior for reproducibility
@@ -198,7 +261,7 @@ to allow storing samples in memories during evaluation
 ## [0.5.0] - 2022-05-18
 ### Added
 - TRPO agent
-- DeepMind environment wrapper
+- Wrapper for DeepMind environments
 - KL Adaptive learning rate scheduler
 - Handle `gym.spaces.Dict` observation spaces (OpenAI Gym and DeepMind environments)
 - Forward environment info to agent `record_transition` method
@@ -218,7 +281,7 @@ to allow storing samples in memories during evaluation
 ## [0.4.1] - 2022-03-22
 ### Added
 - Examples of all Isaac Gym environments (preview 3)
-- Tensorboard file iterator for data post-processing
+- TensorBoard file iterator for data post-processing
 
 ### Fixed
 - Init and evaluate agents in ParallelTrainer

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -54,7 +54,7 @@ Read the code a little bit and you will understand it at first glance... Also
   ```ini
   function annotation (e.g. typing)
   # insert an empty line
-  python libraries and other libraries (e.g. gym, numpy, time, etc.)
+  python libraries and other libraries (e.g. gymnasium, numpy, time, etc.)
   # insert an empty line
   machine learning framework modules (e.g. torch, torch.nn)
   # insert an empty line

diff --git a/README.md b/README.md
@@ -17,7 +17,7 @@
 <h2 align="center" style="border-bottom: 0 !important;">SKRL - Reinforcement Learning library</h2>
 <br>
 
-**skrl** is an open-source modular library for Reinforcement Learning written in Python (on top of [PyTorch](https://pytorch.org/) and [JAX](https://jax.readthedocs.io)) and designed with a focus on modularity, readability, simplicity, and transparency of algorithm implementation. In addition to supporting the OpenAI [Gym](https://www.gymlibrary.dev) / Farama [Gymnasium](https://gymnasium.farama.org) and [DeepMind](https://github.com/deepmind/dm_env) and other environment interfaces, it allows loading and configuring [NVIDIA Isaac Gym](https://developer.nvidia.com/isaac-gym/), [NVIDIA Omniverse Isaac Gym](https://docs.omniverse.nvidia.com/isaacsim/latest/tutorial_gym_isaac_gym.html) and [NVIDIA Isaac Lab](https://isaac-sim.github.io/IsaacLab/index.html) environments, enabling agents' simultaneous training by scopes (subsets of environments among all available environments), which may or may not share resources, in the same run.
+**skrl** is an open-source modular library for Reinforcement Learning written in Python (on top of [PyTorch](https://pytorch.org/) and [JAX](https://jax.readthedocs.io)) and designed with a focus on modularity, readability, simplicity, and transparency of algorithm implementation. In addition to supporting the OpenAI [Gym](https://www.gymlibrary.dev), Farama [Gymnasium](https://gymnasium.farama.org) and [PettingZoo](https://pettingzoo.farama.org), Google [DeepMind](https://github.com/deepmind/dm_env) and [Brax](https://github.com/google/brax), among other environment interfaces, it allows loading and configuring NVIDIA [Isaac Lab](https://isaac-sim.github.io/IsaacLab/index.html) (as well as [Isaac Gym](https://developer.nvidia.com/isaac-gym/) and [Omniverse Isaac Gym](https://github.com/isaac-sim/OmniIsaacGymEnvs)) environments, enabling agents' simultaneous training by scopes (subsets of environments among all available environments), which may or may not share resources, in the same run.
 
 <br>
 

diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -1,7 +1,8 @@
-furo==2023.7.26
+furo==2024.8.6
 sphinx
 sphinx-tabs
 sphinx-autobuild
 sphinx-copybutton
 sphinx-notfound-page
+decorator
 numpy
diff --git a/docs/source/_static/imgs/multi_agent_wrapping-dark.svg b/docs/source/_static/imgs/multi_agent_wrapping-dark.svg
diff --git a/docs/source/_static/imgs/multi_agent_wrapping-light.svg b/docs/source/_static/imgs/multi_agent_wrapping-light.svg
diff --git a/docs/source/_static/imgs/wrapping-dark.svg b/docs/source/_static/imgs/wrapping-dark.svg
diff --git a/docs/source/_static/imgs/wrapping-light.svg b/docs/source/_static/imgs/wrapping-light.svg
diff --git a/docs/source/api/agents.rst b/docs/source/api/agents.rst
@@ -119,7 +119,6 @@ API (PyTorch)
     :private-members: _update, _empty_preprocessor, _get_internal_value
     :members:
 
-    .. automethod:: __init__
     .. automethod:: __str__
 
 .. raw:: html
@@ -136,5 +135,4 @@ API (JAX)
     :private-members: _update, _empty_preprocessor, _get_internal_value
     :members:
 
-    .. automethod:: __init__
     .. automethod:: __str__
diff --git a/docs/source/api/agents/a2c.rst b/docs/source/api/agents/a2c.rst
@@ -25,7 +25,7 @@ Algorithm implementation
 
 | Main notation/symbols:
 |   - policy function approximator (:math:`\pi_\theta`), value function approximator (:math:`V_\phi`)
-|   - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), dones (:math:`d`)
+|   - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), terminated (:math:`d_{_{end}}`), truncated (:math:`d_{_{timeout}}`)
 |   - values (:math:`V`), advantages (:math:`A`), returns (:math:`R`)
 |   - log probabilities (:math:`logp`)
 |   - loss (:math:`L`)
@@ -59,7 +59,7 @@ Learning algorithm
 | :literal:`_update(...)`
 | :green:`# compute returns and advantages`
 | :math:`V_{_{last}}' \leftarrow V_\phi(s')`
-| :math:`R, A \leftarrow f_{GAE}(r, d, V, V_{_{last}}')`
+| :math:`R, A \leftarrow f_{GAE}(r, d_{_{end}} \lor d_{_{timeout}}, V, V_{_{last}}')`
 | :green:`# sample mini-batches from memory`
 | [[:math:`s, a, logp, V, R, A`]] :math:`\leftarrow` states, actions, log_prob, values, returns, advantages
 | :green:`# mini-batches loop`
@@ -232,6 +232,10 @@ Support for advanced features is described in the next table
       - RNN, LSTM, GRU and any other variant
       - .. centered:: :math:`\blacksquare`
       - .. centered:: :math:`\square`
+    * - Mixed precision
+      - Automatic mixed precision
+      - .. centered:: :math:`\blacksquare`
+      - .. centered:: :math:`\square`
     * - Distributed
       - Single Program Multi Data (SPMD) multi-GPU
       - .. centered:: :math:`\blacksquare`
@@ -252,16 +256,12 @@ API (PyTorch)
     :private-members: _update
     :members:
 
-    .. automethod:: __init__
-
 .. autoclass:: skrl.agents.torch.a2c.A2C_RNN
     :undoc-members:
     :show-inheritance:
     :private-members: _update
     :members:
 
-    .. automethod:: __init__
-
 .. raw:: html
 
     <br>
@@ -276,5 +276,3 @@ API (JAX)
     :show-inheritance:
     :private-members: _update
     :members:
-
-    .. automethod:: __init__
diff --git a/docs/source/api/agents/amp.rst b/docs/source/api/agents/amp.rst
@@ -21,7 +21,7 @@ Algorithm implementation
 
 | Main notation/symbols:
 |   - policy (:math:`\pi_\theta`), value (:math:`V_\phi`) and discriminator (:math:`D_\psi`) function approximators
-|   - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), dones (:math:`d`)
+|   - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), terminated (:math:`d_{_{end}}`), truncated (:math:`d_{_{timeout}}`)
 |   - values (:math:`V`), next values (:math:`V'`), advantages (:math:`A`), returns (:math:`R`)
 |   - log probabilities (:math:`logp`)
 |   - loss (:math:`L`)
@@ -57,7 +57,7 @@ Learning algorithm
 | :math:`r_D \leftarrow -log(\text{max}( 1 - \hat{y}(D_\psi(s_{_{AMP}})), \, 10^{-4})) \qquad` with :math:`\; \hat{y}(x) = \dfrac{1}{1 + e^{-x}}`
 | :math:`r' \leftarrow` :guilabel:`task_reward_weight` :math:`r \, +` :guilabel:`style_reward_weight` :guilabel:`discriminator_reward_scale` :math:`r_D`
 | :green:`# compute returns and advantages`
-| :math:`R, A \leftarrow f_{GAE}(r', d, V, V')`
+| :math:`R, A \leftarrow f_{GAE}(r', d_{_{end}} \lor d_{_{timeout}}, V, V')`
 | :green:`# sample mini-batches from memory`
 | [[:math:`s, a, logp, V, R, A, s_{_{AMP}}`]] :math:`\leftarrow` states, actions, log_prob, values, returns, advantages, AMP states
 | [[:math:`s_{_{AMP}}^{^M}`]] :math:`\leftarrow` AMP states from :math:`M`
@@ -237,6 +237,10 @@ Support for advanced features is described in the next table
       - \-
       - .. centered:: :math:`\square`
       - .. centered:: :math:`\square`
+    * - Mixed precision
+      - Automatic mixed precision
+      - .. centered:: :math:`\blacksquare`
+      - .. centered:: :math:`\square`
     * - Distributed
       - Single Program Multi Data (SPMD) multi-GPU
       - .. centered:: :math:`\blacksquare`
@@ -256,5 +260,3 @@ API (PyTorch)
     :show-inheritance:
     :private-members: _update
     :members:
-
-    .. automethod:: __init__
-Original file line number
+Diff line change
@@ Expand Up / @@ -30,6 +30,8 @@ body: @@
         description: The skrl version can be obtained with the command `pip show skrl`.
         options:
           - ---
+          - 1.4.0
+          - 1.3.0
           - 1.2.0
           - 1.1.0
           - 1.0.0
@@ Expand Down @@