diff --git a/README.md b/README.md index e200963..b650b79 100644 --- a/README.md +++ b/README.md @@ -64,18 +64,40 @@ For the exact formula of the reward please refer to `KeplerEnv._reward()` in `gy # Preliminary Training Results -We perofrmed a bunch of a trainings using the [Stable-baselines3](https://github.com/DLR-RM/stable-baselines3) software, in particular the [rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo). +We perofrmed a bunch of a trainings using the [Stable-baselines3](https://github.com/DLR-RM/stable-baselines3) software, in particular the [rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo). We used the default hyperparameters of TD3, SAC. PPO performed significantly worse. A preliminary hyperparameter optimization that we performed showed no significant improvements over the default ones. ### GoalEnv -![Space Gym goal 2 planets](imgs/GoalContinuous2P.png) +#### 2 Planets + + +Human baseline score (mean/std.dev. from 5 episodes) measured using `keyboard_agent.py` 4715 +- 799 + +#### 3 Planets + + +Human baseline score (mean/std.dev. from 5 episodes) measured using `keyboard_agent.py` 4659 +-747 + +#### 4 Planets + + ### Kepler Orbit Env +#### Circle Orbit + + +#### Ellipsoidal Orbit + + + # Conclusions and Future Work +There is still a significant room for improving the performance of RL agents in the presented environments. One particulary promising direction is to try a safety RL method. We expect that better shaped reward functions and extended observation vectors may result in significant performance improvements as well. + +We could not explain the dramatic performance drop when increasing the number of planets from 2 upto 3. The measured human baseline score is similar for the 2 planets env. and is significantly smaller for the case of 3 planets. -# Implementation +# Implementation Remarks ### Environments @@ -111,7 +133,7 @@ Related code is in `hexagonal_tiling.py`. To make sense of it, please refer to ` # Stable-Baselines 3 starting agents -We provide +TBA ### License Copyright 2013 Jacek Cyranka & Kajetan Janiak (University of Warsaw)