Calculation of espsilon decay, Input value of epsilon in the console,…

… Bug fix
VOCdevShy · Mar 14, 2024 · 8192811 · 8192811
1 parent 2a82ffe
commit 8192811
Show file tree

Hide file tree

Showing 4 changed files with 42 additions and 24 deletions.
diff --git a/.replit b/.replit
@@ -11,4 +11,4 @@ language = "python3"
 
 [deployment]
 run = ["python3", "main.py"]
-deploymentTarget = "cloudrun"
+deploymentTarget = "cloudrun"
diff --git a/Doc/Bug list.md b/Doc/Bug list.md
@@ -1,28 +1,28 @@
 This is the bug list. In this files, all the bug of the program is detailed. 
 (If you know how to fix it please fix it)
 
-1-Sequence lag/Sequence recording lag (Link to problem 2 and 3) (Not fix)
+1- Sequence lag/Sequence recording lag (Link to problem 2 and 3) (Not fix)
   - explication:
 
-2-Output broadcast lag (Link to problem 1 and 3) (Not fix)
+2- Output broadcast lag (Link to problem 1 and 3) (Not fix)
   - explication:
 
-3-Agent Input start to early (Link to problem 1 and 2) (Not fix)
+3- Agent Input start to early (Link to problem 1 and 2) (Not fix)
   - explication:
 
-4-"Error" message after the presentation of the Virgin Q-Table:
+4- "Error" message after the presentation of the Virgin Q-Table:
   '/home/runner/Q-Learning-Frozen-lake/.pythonlibs/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:233: DeprecationWarning: `np.bool8` is a deprecated alias for `np.bool_`.  (Deprecated NumPy 1.24)
   if not isinstance(terminated, (bool, np.bool8)):" (Not entierly Fix)'
   - explication: The problem is link between numpy and gymnasium package. With the Version of the 6/02/24 (1.23.4) of numpy the problem is not solve entierly and this message doesn't impact the code so i hide it with a warning.
 
-5-The agent might be doing 2 actions by input and not only 1? (link to problem 1, 2, 3) (Not fix)
+5- The agent might be doing 2 actions by input and not only 1? (link to problem 1, 2, 3) (Not fix)
   - explication:
 
-6-The agent is doing the action 0 (left) to much time when epsilon < Q-Table (Link to problem 5) (Not fix)
+6- The agent is doing the action 0 (left) to much time when epsilon < Q-Table (Link to problem 5) (Not fix)
   - explication:
 
-7-Teleportation of the agent on a a case like he is doing two actions in one and this appear in one action in the sequence (Link to problem 5)
-  -explication
+7- Teleportation of the agent on a a case like he is doing two actions in one and this appear in one action in the sequence (Link to problem 5) (Not fix)
+  - explication
 
-8-When the test of the Q-Table is going it is possible for the agent to have a problem of going left and right infinitly on a case
-  -explication:
+8- When the test of the Q-Table is going it is possible for the agent to have a problem of going left and right infinitly on a case (Not fix)
+  - explication:
diff --git a/Doc/Patch Note.md b/Doc/Patch Note.md
@@ -1,11 +1,11 @@
-Version: 1.0.1 (minor update) date: 29/02/2024 at: 10h20 a.m (CET(UTC+1)):
+Version: 1.0.1 (minor update) date: 29/02/2024 at: 10:20 a.m (CET(UTC+1)):
 - Half resolved the bug n°4 (Check the "Bug List.md files" to see the explication)
 
-Version: 1.1.0 (major update) date: 01/03/2024 at: 09h53 a.m (CET(UTC+1)):
+Version: 1.1.0 (major update) date: 01/03/2024 at: 09:53 a.m (CET(UTC+1)):
 - Implementation of the test of the Q-Table after training. For a 100 episodes you can try your Q-Table to see if the update is good or not.
 -  If the results is upper or equal than 50% it is a good update between 33% and 49% it is not good at it could be, between 25% and 33% that's not a good update as well and less than 25% that's not a good update.
 
-Version 1.2.0 (major update) date: 03/03/2024 at: 2h20 p.m (CET(UTC+1)):
+Version 1.2.0 (major update) date: 03/03/2024 at: 2:20 p.m (CET(UTC+1)):
 - Adding new datas obtention:
   - longest_sequence: List of states in the longer episode that doesn't reach the goal
   - longest_best_sequence: List of states in the longest episode that reach the goal
@@ -19,4 +19,10 @@ Version 1.2.0 (major update) date: 03/03/2024 at: 2h20 p.m (CET(UTC+1)):
   - Problem not fixed yet: n°1, n°2, n°3, n°5, n°6
   - Problem fixed: n°4 (fixed in the patch 1.0.1)
 - Bug/Problem fixed (not listed):
-  - Correction of the launch of the test of the updated Q-Table
+  - Correction of the launch of the test of the updated Q-Table
+
+Version 1.2.1 (minor update) date: 14/03/2024 at: 5:50 p.m (CET(UTC+1)):
+- Calculation of espsilon decay (calcul detail: 1/episodes)
+- Input value of epsilon in the console
+- Bug fix(not listed)
+  - Q-Table training sucess rate calculation
diff --git a/main.py b/main.py
@@ -23,11 +23,21 @@
 render_mode="human", is_slippery=False)
 
 # Hyperparameters
-episodes = 100 # Total of episodes
+#episodes = 100 # Total of episodes
+# Handle ValueError exception for float conversion of input
+try:
+    episodes_value = float(input("Enter the number of episodes for this training session: "))
+    if episodes_value > 0:
+      episodes = episodes_value
+    else:
+        print("Please enter a positive number:")
+except ValueError:
+    print("Please enter a number:")
+
 alpha = 0.5  # Learning Rate
 gamma = 0.9  # Discount factor
 epsilon = 1.0  # Amount of randomness in the action selection
-epsilon_decay = 0.01 # Fixed amount to decrease
+epsilon_decay = 1/int(episodes) # Fixed amount to decrease
 
 # Datas
 nb_success = 0 # Number of success
@@ -57,12 +67,13 @@
 #Q-tbale calculation
 qtable = np.zeros((env.observation_space.n, env.action_space.n))
 # show the Q-table
+print(" ")
 print('Q-table before training: ')
 print(qtable)
 print(' ')
 
 # Learning loop
-for episode in range(episodes):
+for episode in range(int(episodes)):
     sequence = [] # List of states in the episode
     state = env.reset() # Reset the environment
     done = False
@@ -192,6 +203,7 @@
     print(" ")
     print("Test of the updated Q-Table")
     #re-initialize the data
+    episodes = 100
     best_sequence = [] 
     longest_sequence = []
     longest_best_sequence = []
@@ -202,7 +214,7 @@
     reward_sequence = []
     nb_success = 0
     epsilon = 1.0 # same it doesn't used but we nerver know
-    for episode in range(100):
+    for episode in range(episodes):
         sequence = [] # List of states in the episode
         state = env.reset() # Reset the environment
         done = False
@@ -269,7 +281,7 @@
         print(" ")
 
     # Results of the Q-table after training and a test without update
-    print("Results after " + str(episodes) + " episodes: ")
+    print("Results after " + str(int(episodes)) + " episodes: ")
     print(" ")
     print(qtable)
     print(" ")
@@ -299,15 +311,15 @@
       sequence_words = [action_words[action] for action in sequence]  # Convert actions input number into input words
       print(f"Sequence {episode_num}: {sequence} / {sequence_words}")
     print(" ")
-    print (f"Success rate = {(nb_success/episodes)*100}%")
+    print (f"Success rate = {(nb_success/int(episodes))*100}%")
     #Success rate of the update of the Q-table
-    if (nb_success/episodes)*100 == 100 :
+    if (nb_success/int(episodes))*100 == 100 :
       print(" ")
       print("The Update of the Q-Table is PERFECT!")
-    if 80 <= (nb_success/episodes)*100 <= 99 :
+    if 80 <= (nb_success/int(episodes))*100 <= 99 :
       print(" ")
       print("The Update of the Q-Table is a great success!")
-    if 50 <= (nb_success/episodes)*100 <= 79:
+    if 50 <= (nb_success/int(episodes))*100 <= 79:
       print(" ")
       print("The Update of the Q-Table is successful!")
     if 33 <= (nb_success/episodes)*100 <= 49: