Skip to content

Latest commit

 

History

History
62 lines (41 loc) · 3.75 KB

README.md

File metadata and controls

62 lines (41 loc) · 3.75 KB

⛰ Markov Decision Process' Value Iteration Algorithm

The content of this repository served as an assignment project requested for the course Probabilistic Graphical Models at the INAOE as a student of the Master in Science in Computer Science. All the resources presented in the versions of this code were obtained from the class book that you can find in the references part.

This application of the algorithm and information was for an only educational purpose

Description:

Implement the value iteration algorithm to solve a discrete Markov Decision Processes.

Professor:

Student Involved:

Instructions

  1. Download the repository's file
  2. Verify that the C++ version is at least C++ 14
  3. Call the functions marked in the documentation

The following algorithms are based on the documentation provided by the professor. The book used as a reference is at the end of this file.

  1. The value iteration algorithm consists of iteratively estimate the value for each state, s, based on Bellman's equation. The next image shows the pseudocode used to create this project.
  1. The Policy iteration algorithm consists of iteratively estimate the value for each state, s, based on Bellman's equation, with the main difference we store the Policy in each iteration, it would allow us to compare an iteration (t) with a (t-1), then if the Policy is the same we finish the process, this gives you a computational speed advantage at storage cost. The image 2 shows the pseudocode used to create this project.

Examples The class need to be call as the figure indicates:

  • Object Creation

We used two examples to confirm the algorithm's functionality, called "The robot path", from the book, and "The bear travel" from an online blog called Towards Data Science (Link in the references)

  • The robot path

  • The bear travel

  • Let's start with the robot path example, consider figure 1 as a grid to complete, our code needs some parameters defined in the description, so the next image shows what we mean.

    • First, consider that the enumeration of the states follows:
      • States
    • We pass the parameters as a matrixes, as you can see in the next figure:
      • Parameters Descriptions
    • Then running the fuctions you would see the following results:
    • Policy_Iteration
    • Policy
    • So then, you would follow the next path
    • Policy
  • Let's solve now the second example, now we are just going to show the images of each fuction used and results:

    • States
    • Parameters Descriptions
    • Policy_Iteration
    • Policy_Iteration

#References