🌐 Synthetic Gymnax
Drop-in environment replacements that make your RL algorithm train faster.

Synthetic gymnax contains Gymnax environments that train agents within 10k time steps.

🔄 Make a one-line change ...

Simply replace

by

import gymnax
env, params = gymnax.make("CartPole-v1")

...  # your training code

import gymnax, synthetic_gymnax
env, params = gymnax.make("Synthetic-CartPole-v1")
# add 'synthetic' to env:  ^^^^^^^^^^
...  # your training code

💨 ... and enjoy fast training.

The synthetic environments are meta-learned to train agents within 10k time steps. This can be much faster than training in the real environment, even when using tuned hyperparameters!

🟩 Real environment training, using tuned hyperparameters (IQM of 5 training runs)
🟦 Synthetic environment training, using any reasonable hyperparameters (IQM performance of 20 training runs with random HP configurations)

🏗 Installing synthetic-gymnax

Install via pip: pip install synthetic-gymnax
Install from source: pip install git+https://github.com/keraJLi/synthetic-gymnax

🏅 Performance of agents after training for 10k synthetic steps

Classic control: 10k synthetic 🦶
Environment	PPO	SAC	DQN	DDPG	TD3
Synthetic-Acrobot-v1	-84.1	-85.3	-82.6	-	-
Synthetic-CartPole-v1	500.0	500.0	500.0	-	-
Synthetic-Mountaincar-v0	-181.8	-170.1	-118.4	-	-
Synthetic-CountinuousMountainCar-v0	66.9	91.1	-	97.6	97.5
Synthetic-Pendulum-v1	-205.4	-188.3	-	-164.3	-168.5

Brax: 10k synthetic, 5m real 🦶
Environment	PPO		SAC		DDPG		TD3
Environment	Synthetic	Real	Synthetic	Real	Synthetic	Real	Synthetic	Real
halfcheetah	1657.4	3487.1	5810.4	7735.5	6162.4	3263.3	6555.8	13213.5
hopper	853.5	2521.9	2738.8	3119.4	3012.4	1536.0	2985.3	3325.8
humanoidstandup	13356.1	17243.5	21105.2	23808.1	21039.0	24944.8	20372.0	28376.2
swimmer	348.5	83.6	361.6	124.8	365.1	348.5	365.4	232.2
walker2d	858.3	2039.6	1323.1	4140.1	1304.3	698.3	1321.8	4605.8

💡 Background

The environments in this package are the result of our paper, Discovering Minimal Reinforcement Learning Environments (citation below). They are optimized using evolutionary meta-learning, such that they maximize the performance of an agent after training in the synthetic environment. In the paper, we find that

The synthetic environments don't need to have episodes that exceed a single time steps. Instead, synthetic contextual bandits are enough to train good policies.
The synthetic contextual bandits generalize to unseen network architectures and optimization schemes. While gradient-based optimization was used during meta-learning, evolutionary methods work in evaluation, too.
We can speed up downstream meta-learning applications, such as Discovered Policy Optimization. For more info, have a look at the paper!

💫Replicating our results

We provide the configurations used in meta-training the checkpoints for synthetic environments in synthetic_gymnax/checkpoints/*environment*/config.yaml. They can be used with the meta-learning script by calling e.g.

python examples/metalearn_synthenv.py --config synthetic_gymnax/checkpoints/hopper/config.yaml

Please note that when installing via pip, the configs are not bundled with the package. Please clone the repository to get them.

✍ Citing and more information

If you use the provided synthetic environments in your work, please cite our paper as

@article{liesen2024discovering,
  title={Discovering Minimal Reinforcement Learning Environments}, 
  author={Jarek Liesen and Chris Lu and Andrei Lupu and Jakob N. Foerster and Henning Sprekeler and Robert T. Lange},
  year={2024},
  eprint={2406.12589},
  archivePrefix={arXiv}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
examples		examples
img		img
synthetic_gymnax		synthetic_gymnax
test		test
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌐 Synthetic Gymnax
Drop-in environment replacements that make your RL algorithm train faster.

🔄 Make a one-line change ...

💨 ... and enjoy fast training.

🏗 Installing synthetic-gymnax

🏅 Performance of agents after training for 10k synthetic steps

💡 Background

💫Replicating our results

✍ Citing and more information

About

Releases 1

Packages

Languages

License

keraJLi/synthetic-gymnax

Folders and files

Latest commit

History

Repository files navigation

🌐 Synthetic Gymnax Drop-in environment replacements that make your RL algorithm train faster.

🔄 Make a one-line change ...

💨 ... and enjoy fast training.

🏗 Installing synthetic-gymnax

🏅 Performance of agents after training for 10k synthetic steps

💡 Background

💫Replicating our results

✍ Citing and more information

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

🌐 Synthetic Gymnax
Drop-in environment replacements that make your RL algorithm train faster.

Packages