Synthetic gymnax contains Gymnax environments that train agents within 10k time steps.
Simply replace | by |
---|---|
import gymnax
env, params = gymnax.make("CartPole-v1")
... # your training code |
import gymnax, synthetic_gymnax
env, params = gymnax.make("Synthetic-CartPole-v1")
# add 'synthetic' to env: ^^^^^^^^^^
... # your training code |
The synthetic environments are meta-learned to train agents within 10k time steps. This can be much faster than training in the real environment, even when using tuned hyperparameters!
- 🟩 Real environment training, using tuned hyperparameters (IQM of 5 training runs)
- 🟦 Synthetic environment training, using any reasonable hyperparameters (IQM performance of 20 training runs with random HP configurations)
- Install via pip:
pip install synthetic-gymnax
- Install from source:
pip install git+https://github.com/keraJLi/synthetic-gymnax
Classic control: 10k synthetic 🦶 | |||||
---|---|---|---|---|---|
Environment | PPO | SAC | DQN | DDPG | TD3 |
Synthetic-Acrobot-v1 | -84.1 | -85.3 | -82.6 | - | - |
Synthetic-CartPole-v1 | 500.0 | 500.0 | 500.0 | - | - |
Synthetic-Mountaincar-v0 | -181.8 | -170.1 | -118.4 | - | - |
Synthetic-CountinuousMountainCar-v0 | 66.9 | 91.1 | - | 97.6 | 97.5 |
Synthetic-Pendulum-v1 | -205.4 | -188.3 | - | -164.3 | -168.5 |
Brax: 10k synthetic, 5m real 🦶 | ||||||||
---|---|---|---|---|---|---|---|---|
Environment | PPO | SAC | DDPG | TD3 | ||||
Synthetic | Real | Synthetic | Real | Synthetic | Real | Synthetic | Real | |
halfcheetah | 1657.4 | 3487.1 | 5810.4 | 7735.5 | 6162.4 | 3263.3 | 6555.8 | 13213.5 |
hopper | 853.5 | 2521.9 | 2738.8 | 3119.4 | 3012.4 | 1536.0 | 2985.3 | 3325.8 |
humanoidstandup | 13356.1 | 17243.5 | 21105.2 | 23808.1 | 21039.0 | 24944.8 | 20372.0 | 28376.2 |
swimmer | 348.5 | 83.6 | 361.6 | 124.8 | 365.1 | 348.5 | 365.4 | 232.2 |
walker2d | 858.3 | 2039.6 | 1323.1 | 4140.1 | 1304.3 | 698.3 | 1321.8 | 4605.8 |
The environments in this package are the result of our paper, Discovering Minimal Reinforcement Learning Environments (citation below). They are optimized using evolutionary meta-learning, such that they maximize the performance of an agent after training in the synthetic environment. In the paper, we find that
- The synthetic environments don't need to have episodes that exceed a single time steps. Instead, synthetic contextual bandits are enough to train good policies.
- The synthetic contextual bandits generalize to unseen network architectures and optimization schemes. While gradient-based optimization was used during meta-learning, evolutionary methods work in evaluation, too.
- We can speed up downstream meta-learning applications, such as Discovered Policy Optimization. For more info, have a look at the paper!
We provide the configurations used in meta-training the checkpoints for synthetic environments in synthetic_gymnax/checkpoints/*environment*/config.yaml
. They can be used with the meta-learning script by calling e.g.
python examples/metalearn_synthenv.py --config synthetic_gymnax/checkpoints/hopper/config.yaml
Please note that when installing via pip, the configs are not bundled with the package. Please clone the repository to get them.
If you use the provided synthetic environments in your work, please cite our paper as
@article{liesen2024discovering,
title={Discovering Minimal Reinforcement Learning Environments},
author={Jarek Liesen and Chris Lu and Andrei Lupu and Jakob N. Foerster and Henning Sprekeler and Robert T. Lange},
year={2024},
eprint={2406.12589},
archivePrefix={arXiv}
}