PyTorch implementations of Reinforcement Learning algorithms in less than 200 lines.
-
Deep Reinforcement Learning
-
Bandits
- Epsilon Greedy
- Softmax action selection
- UCB-1
- REINFORCE
-
Classical MDP Control
- SARSA
- Q-learning
- SARSA(lambda)
- Vanilla Policy Gradient
-
Additional Resources
- Report on Bandit algorithms
- Report on Classical MDP control algorithms
- Contour environment - gym-contour
- Puddle world - gym-puddle
- PyTorch
- Tensorboard
- OpenAI Gym
- Numpy
- Clone the repository.
- Run experiments on an algorithm by running either .py or main.py within its directory.
- Tensorboard of my experiments can be viewed by using the 'Result' links given above.
-
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, (2018) [bib] by Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel and Sergey Levine
-
Proximal Policy Optimization Algorithms, (2017) [bib] by John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford and Oleg Klimov
-
Benchmarking Deep Reinforcement Learning for Continuous Control, (2016) [bib] by Yan Duan, Xi Chen, Rein Houthooft, John Schulman and Pieter Abbeel
-
Playing Atari with Deep Reinforcement Learning, (2013) [bib] by Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra and Martin A. Riedmiller
-
Using Confidence Bounds for Exploitation-Exploration Trade-offs, (2002) [bib] by Peter Auer
-
Eligibility Traces for Off-Policy Policy Evaluation, (2000) [bib] by Doina Precup, Richard S. Sutton and Satinder P. Singh
-
Policy Gradient Methods for Reinforcement Learning with Function Approximation, (1999) [bib] by Richard S. Sutton, David A. McAllester, Satinder P. Singh and Yishay Mansour
-
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, (1992) [bib] by Ronald J. Williams
-
Q-learning, (1992) [bib] by Chris Watkins and Peter Dayan
-
Deterministic Policy Gradient Algorithms, (2014) [bib] by David Silver, Guy Lever, Nicolas Manfred Otto Heess, Thomas Degris, Daan Wierstra and Martin A. Riedmiller