This repo contains an optimised version of PPO using tricks like Generalised Advantage Estimates, Entropy Regularisation etc. in an attempt to match the performance offered by StableBaselines3's PPO.
- To train the agent, run
train.py
- Run
tensorboard --logdir runs
to visualise the data in your browser - To test the trained policy, run
test.py
PPO Continuous LunarLander-v2 | PPO Continuous LunarLander-v2 |
---|---|
PPO Continuous BipedalWalker-v3 | PPO Continuous BipedalWalker-v3 |
---|---|