⛰ Markov Decision Process' Value Iteration Algorithm

The content of this repository served as an assignment project requested for the course Probabilistic Graphical Models at the INAOE as a student of the Master in Science in Computer Science. All the resources presented in the versions of this code were obtained from the class book that you can find in the references part.

This application of the algorithm and information was for an only educational purpose

Description:

Implement the value iteration algorithm to solve a discrete Markov Decision Processes.

Professor:

PhD Enrique Sucar.

Student Involved:

Mario De Los Santos. Github: MarSH-Up. Email: [email protected]

Instructions

Download the repository's file
Verify that the C++ version is at least C++ 14
Call the functions marked in the documentation

The following algorithms are based on the documentation provided by the professor. The book used as a reference is at the end of this file.

The value iteration algorithm consists of iteratively estimate the value for each state, s, based on Bellman's equation. The next image shows the pseudocode used to create this project.

The Policy iteration algorithm consists of iteratively estimate the value for each state, s, based on Bellman's equation, with the main difference we store the Policy in each iteration, it would allow us to compare an iteration (t) with a (t-1), then if the Policy is the same we finish the process, this gives you a computational speed advantage at storage cost. The image 2 shows the pseudocode used to create this project.

Examples The class need to be call as the figure indicates:

We used two examples to confirm the algorithm's functionality, called "The robot path", from the book, and "The bear travel" from an online blog called Towards Data Science (Link in the references)

Let's start with the robot path example, consider figure 1 as a grid to complete, our code needs some parameters defined in the description, so the next image shows what we mean.
- First, consider that the enumeration of the states follows:
- We pass the parameters as a matrixes, as you can see in the next figure:
- Then running the fuctions you would see the following results:
- So then, you would follow the next path
Let's solve now the second example, now we are just going to show the images of each fuction used and results:

#References

Sucar, L. E. (2020). Probabilistic graphical models. Advances in Computer Vision and Pattern Recognition.London: Springer London. doi, 10(978), Chapter 11.
Mohammad Ashraf. (2018). Reinforcement Learning Demystified: Solving MDPs with Dynamic Programming. April 2021, de Towards Data Science Sitio web: https://towardsdatascience.com/reinforcement-learning-demystified-solving-mdps-with-dynamic-programming-b52c8093c919

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Images		Images
Source_code		Source_code
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⛰ Markov Decision Process' Value Iteration Algorithm

Description:

About

Releases

Packages

Languages

License

MarSH-Up/MDPs_Value-Iteration

Folders and files

Latest commit

History

Repository files navigation

⛰ Markov Decision Process' Value Iteration Algorithm

Description:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages