The content of this repository served as an assignment project requested for the course Probabilistic Graphical Models at the INAOE as a student of the Master in Science in Computer Science. All the resources presented in the versions of this code were obtained from the class book that you can find in the references part.
This application of the algorithm and information was for an only educational purpose
Implement the value iteration algorithm to solve a discrete Markov Decision Processes.Professor:
- PhD Enrique Sucar.
Student Involved:
- Mario De Los Santos. Github: MarSH-Up. Email: [email protected]
Instructions
- Download the repository's file
- Verify that the C++ version is at least C++ 14
- Call the functions marked in the documentation
The following algorithms are based on the documentation provided by the professor. The book used as a reference is at the end of this file.
- The value iteration algorithm consists of iteratively estimate the value for each state, s, based on Bellman's equation. The next image shows the pseudocode used to create this project.
- The Policy iteration algorithm consists of iteratively estimate the value for each state, s, based on Bellman's equation, with the main difference we store the Policy in each iteration, it would allow us to compare an iteration (t) with a (t-1), then if the Policy is the same we finish the process, this gives you a computational speed advantage at storage cost. The image 2 shows the pseudocode used to create this project.
Examples The class need to be call as the figure indicates:
We used two examples to confirm the algorithm's functionality, called "The robot path", from the book, and "The bear travel" from an online blog called Towards Data Science (Link in the references)
-
Let's start with the robot path example, consider figure 1 as a grid to complete, our code needs some parameters defined in the description, so the next image shows what we mean.
-
Let's solve now the second example, now we are just going to show the images of each fuction used and results:
#References
- Sucar, L. E. (2020). Probabilistic graphical models. Advances in Computer Vision and Pattern Recognition.London: Springer London. doi, 10(978), Chapter 11.
- Mohammad Ashraf. (2018). Reinforcement Learning Demystified: Solving MDPs with Dynamic Programming. April 2021, de Towards Data Science Sitio web: https://towardsdatascience.com/reinforcement-learning-demystified-solving-mdps-with-dynamic-programming-b52c8093c919