diff --git a/README.md b/README.md index 9edd302..b059b3e 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,7 @@ Synthetic gymnax contains Gymnax environments that train agents within 10k time steps. -## ๐Ÿ’ก Make a one-line change ... +## ๐Ÿ”„ Make a one-line change ... @@ -61,6 +61,9 @@ This can be much faster than training in the real environment, even when using t - ๐ŸŸฉ **Real environment** training, using tuned hyperparameters (IQM of 5 training runs) - ๐ŸŸฆ **Synthetic environment** training, using any reasonable hyperparameters (IQM performance of 20 training runs with random HP configurations) +## ๐Ÿ— Installing synthetic-gymnax +1. Install via pip: `pip install synthetic-gymnax` +2. Install from source: `pip install git+https://github.com/keraJLi/synthetic-gymnax` ## ๐Ÿ… Performance of agents after training for 10k synthetic steps
Simply replace
@@ -202,6 +205,14 @@ This can be much faster than training in the real environment, even when using t
+## ๐Ÿ’ก Background +The environments in this package are the result of our paper, [Discovering Minimal Reinforcement Learning Environments](https://arxiv.org/abs/2406.12589) (citation below). +They are optimized using evolutionary meta-learning, such that they maximize the performance of an agent after training in the synthetic environment. +In the paper, we find that +1. The synthetic environments don't need to have episodes that exceed a single time steps. Instead, **synthetic contextual bandits** are enough to train good policies. +2. The synthetic contextual bandits generalize to unseen network architectures and optimization schemes. While gradient-based optimization was used during meta-learning, evolutionary methods work in evaluation, too. + +![Conceptual algorithm overview](img/conceptual.png) ## ๐Ÿ’ซReplicating our results We provide the configurations used in meta-training the checkpoints for synthetic environments in `synthetic_gymnax/checkpoints/*environment*/config.yaml`. They can be used with the meta-learning script by calling e.g. diff --git a/img/conceptual.png b/img/conceptual.png new file mode 100755 index 0000000..45272fc Binary files /dev/null and b/img/conceptual.png differ