rl-playground/Tabular at main · arya-ebrahimi/rl-playground

README.md

Dynamic Programming : Policy and Value iteration algorithms are implemented and tested on two Gym environments.
Monte Carlo : Monte Carlo prediction and control for Blackjack.
10 Armed Bandits: 10-armed bandit, testing different exploration approaches.
MDP, Bellman equations and DP: Chapter3&4 RLBook2018, MDP and Bellman equations, Dynamic Programming on custom gridworld environment.
MCTS, FA and Policy Gradients: Chapter8&9&13 RLBook2018, MCTS, Function Approximation and Policy Gradients, homework of rl-course-spring2023 @ Ferdowsi University of Mashhad.
DP on Frozenlake: Chapter4 RLBook2018, Dynamic Programming, policy and value iterations on FrozenLake environment, mini-project of rl-course-2023 @ Ferdowsi University of Mashhad.
Sample based methods: : Chapter5&6 RLBook2018, A comparison of Monte Carlo and Temporal Difference control methods (SARSA & Q-Learning).