- Dynamic Programming : Policy and Value iteration algorithms are implemented and tested on two Gym environments.
- Monte Carlo : Monte Carlo prediction and control for Blackjack.
- 10 Armed Bandits: 10-armed bandit, testing different exploration approaches.
- MDP, Bellman equations and DP: Chapter3&4 RLBook2018, MDP and Bellman equations, Dynamic Programming on custom gridworld environment.
- MCTS, FA and Policy Gradients: Chapter8&9&13 RLBook2018, MCTS, Function Approximation and Policy Gradients, homework of rl-course-spring2023 @ Ferdowsi University of Mashhad.
- DP on Frozenlake: Chapter4 RLBook2018, Dynamic Programming, policy and value iterations on FrozenLake environment, mini-project of rl-course-2023 @ Ferdowsi University of Mashhad.
- Sample based methods: : Chapter5&6 RLBook2018, A comparison of Monte Carlo and Temporal Difference control methods (SARSA & Q-Learning).
Tabular
Folders and files
Name | Name | Last commit date | ||
---|---|---|---|---|
parent directory.. | ||||