Version 1.2.3 (Check the “Patch Note.md” file in the “Doc” folder for…

… more informations about the update)
VOCdevShy · Apr 2, 2024 · 2cabd77 · 2cabd77
1 parent 271ece8
commit 2cabd77
Show file tree

Hide file tree

Showing 3 changed files with 12 additions and 15 deletions.
diff --git a/Doc/Patch Note.md b/Doc/Patch Note.md
@@ -22,7 +22,7 @@ Version 1.2.0 (major update) date: 03/03/2024 at: 2:20 p.m (CET(UTC+1)):
   - Correction of the launch of the test of the updated Q-Table
 
 Version 1.2.1 (minor update) date: 14/03/2024 at: 5:50 p.m (CET(UTC+1)):
-- Calculation of espsilon decay (calcul detail: 1/episodes)
+- Calculation of espsilon decay (calcul detail: `1/episodes`)
 - Input value of epsilon in the console
 - Bug fix(not listed)
   - Q-Table training sucess rate calculation

diff --git a/README.md b/README.md
@@ -7,8 +7,6 @@ The list of predefined maps are in the map files in the tools folder. Here you c
 
 If you want more information about Q-Learning and the Frozen Lake game, please read the article from medium, he help me a lot to understand what to do in the code: https://medium.com/towards-data-science/q-learning-for-beginners-2837b777741
 
-Do your own test by moving values if you want!
-
 For those who are interested by the calculation of the Q-Table here is an explication:
 
 `qtable[state, action] = qtable[state, action] + alpha * (reward + gamma * np.max(qtable[next_state, :]) - qtable[state, action])`
@@ -17,7 +15,7 @@ For those who are interested by the calculation of the Q-Table here is an explic
 
 - `alpha`: This is the learning rate. It controls the extent to which new information will be integrated into the old values of the Q-table. A high value means that new information will have a greater impact on existing values, while a low value means they will have a lesser impact.
 
-- `reward`: This is the immediate reward obtained after taking action 'action' in state 'state'. This reward can be positive, negative, or zero.
+- `reward`: This is the immediate reward obtained after taking action in state . This reward is equals to a postive float.
 
 - `gamma`: This is the discount factor. It represents the importance of future rewards compared to immediate rewards. A gamma close to 1 gives great importance to future rewards, while a gamma close to 0 gives similar importance to all rewards, whether immediate or future.
 

diff --git a/main.py b/main.py
@@ -1,7 +1,7 @@
 import time
 import warnings
-
 import gym
+from gym.envs.toy_text.frozen_lake import generate_random_map # To generate a random map if you want
 import matplotlib.pyplot as plt
 import numpy as np
 from IPython.display import clear_output
@@ -139,8 +139,8 @@
       elif len(sequence) > len(longest_best_sequence):
         longest_best_sequence = sequence
 
-        if best_sequence == sequence:
-          recurent_sequence = recurent_sequence + 1
+      if best_sequence == sequence:
+        recurent_sequence = recurent_sequence + 1
 
   epsilon = max(epsilon - epsilon_decay, 0)
   clear_output(wait=True)
@@ -200,7 +200,6 @@
 print(" ")
 
 if test == "n":
-  print("  ")
   plt.figure(figsize=(3, 1.25))
   plt.xlabel("Run number")
   plt.ylabel("Outcome")
@@ -211,7 +210,6 @@
 
 # Loop for the test of the updated Q-Table
 if test == "y":
-  print(" ")
   print("Test of the updated Q-Table")
   print(" ")
   #re-initialize the data
@@ -239,10 +237,9 @@
         action = env.action_space.sample()
       # If there's no best action (only zeros), take a random one
       if np.max(qtable[state]) > 0:
-        if np.argmax(qtable[state]) == 0:
-          action = env.action_space.sample()
-        else:
           action = np.argmax(qtable[state])
+      if np.argmax(qtable[state]) == 0:
+        action = env.action_space.sample()
       sequence.append(action)
 
       next_state, reward, done, info, _ = env.step(action)
@@ -263,6 +260,8 @@
       if reward:
         outcome[-1] = "Success"
         reward_counter = reward_counter + 1
+        reward_episode.append(episode)
+        reward_sequence.append(sequence)
         if not best_sequence:
           best_sequence = sequence
         elif len(sequence) < len(best_sequence):
@@ -272,8 +271,8 @@
           longest_best_sequence = sequence
         elif len(sequence) > len(longest_best_sequence):
           longest_best_sequence = sequence
-          if best_sequence == sequence:
-            recurent_sequence = recurent_sequence + 1
+        if best_sequence == sequence:
+          recurent_sequence = recurent_sequence + 1
       if best_sequence == []:
         recurent_sequence = 0
 
@@ -328,7 +327,7 @@
   #Success rate of the update of the Q-table
   if (nb_success / int(episodes)) * 100 == 100:
     print(" ")
-    print("The Update of the Q-Table is PERFECT!")
+    print("The Update of the Q-Table is PERFECT to reach the goal!")
   if 80 <= (nb_success / int(episodes)) * 100 <= 99:
     print(" ")
     print("The Update of the Q-Table is a great success!")