Skip to content

Commit

Permalink
Calculation of espsilon decay, Input value of epsilon in the console,…
Browse files Browse the repository at this point in the history
… Bug fix
  • Loading branch information
VOCdevShy committed Mar 14, 2024
1 parent 2a82ffe commit 8192811
Show file tree
Hide file tree
Showing 4 changed files with 42 additions and 24 deletions.
2 changes: 1 addition & 1 deletion .replit
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ language = "python3"

[deployment]
run = ["python3", "main.py"]
deploymentTarget = "cloudrun"
deploymentTarget = "cloudrun"
20 changes: 10 additions & 10 deletions Doc/Bug list.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,28 @@
This is the bug list. In this files, all the bug of the program is detailed.
(If you know how to fix it please fix it)

1-Sequence lag/Sequence recording lag (Link to problem 2 and 3) (Not fix)
1- Sequence lag/Sequence recording lag (Link to problem 2 and 3) (Not fix)
- explication:

2-Output broadcast lag (Link to problem 1 and 3) (Not fix)
2- Output broadcast lag (Link to problem 1 and 3) (Not fix)
- explication:

3-Agent Input start to early (Link to problem 1 and 2) (Not fix)
3- Agent Input start to early (Link to problem 1 and 2) (Not fix)
- explication:

4-"Error" message after the presentation of the Virgin Q-Table:
4- "Error" message after the presentation of the Virgin Q-Table:
'/home/runner/Q-Learning-Frozen-lake/.pythonlibs/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:233: DeprecationWarning: `np.bool8` is a deprecated alias for `np.bool_`. (Deprecated NumPy 1.24)
if not isinstance(terminated, (bool, np.bool8)):" (Not entierly Fix)'
- explication: The problem is link between numpy and gymnasium package. With the Version of the 6/02/24 (1.23.4) of numpy the problem is not solve entierly and this message doesn't impact the code so i hide it with a warning.

5-The agent might be doing 2 actions by input and not only 1? (link to problem 1, 2, 3) (Not fix)
5- The agent might be doing 2 actions by input and not only 1? (link to problem 1, 2, 3) (Not fix)
- explication:

6-The agent is doing the action 0 (left) to much time when epsilon < Q-Table (Link to problem 5) (Not fix)
6- The agent is doing the action 0 (left) to much time when epsilon < Q-Table (Link to problem 5) (Not fix)
- explication:

7-Teleportation of the agent on a a case like he is doing two actions in one and this appear in one action in the sequence (Link to problem 5)
-explication
7- Teleportation of the agent on a a case like he is doing two actions in one and this appear in one action in the sequence (Link to problem 5) (Not fix)
- explication

8-When the test of the Q-Table is going it is possible for the agent to have a problem of going left and right infinitly on a case
-explication:
8- When the test of the Q-Table is going it is possible for the agent to have a problem of going left and right infinitly on a case (Not fix)
- explication:
14 changes: 10 additions & 4 deletions Doc/Patch Note.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
Version: 1.0.1 (minor update) date: 29/02/2024 at: 10h20 a.m (CET(UTC+1)):
Version: 1.0.1 (minor update) date: 29/02/2024 at: 10:20 a.m (CET(UTC+1)):
- Half resolved the bug n°4 (Check the "Bug List.md files" to see the explication)

Version: 1.1.0 (major update) date: 01/03/2024 at: 09h53 a.m (CET(UTC+1)):
Version: 1.1.0 (major update) date: 01/03/2024 at: 09:53 a.m (CET(UTC+1)):
- Implementation of the test of the Q-Table after training. For a 100 episodes you can try your Q-Table to see if the update is good or not.
- If the results is upper or equal than 50% it is a good update between 33% and 49% it is not good at it could be, between 25% and 33% that's not a good update as well and less than 25% that's not a good update.

Version 1.2.0 (major update) date: 03/03/2024 at: 2h20 p.m (CET(UTC+1)):
Version 1.2.0 (major update) date: 03/03/2024 at: 2:20 p.m (CET(UTC+1)):
- Adding new datas obtention:
- longest_sequence: List of states in the longer episode that doesn't reach the goal
- longest_best_sequence: List of states in the longest episode that reach the goal
Expand All @@ -19,4 +19,10 @@ Version 1.2.0 (major update) date: 03/03/2024 at: 2h20 p.m (CET(UTC+1)):
- Problem not fixed yet: n°1, n°2, n°3, n°5, n°6
- Problem fixed: n°4 (fixed in the patch 1.0.1)
- Bug/Problem fixed (not listed):
- Correction of the launch of the test of the updated Q-Table
- Correction of the launch of the test of the updated Q-Table

Version 1.2.1 (minor update) date: 14/03/2024 at: 5:50 p.m (CET(UTC+1)):
- Calculation of espsilon decay (calcul detail: 1/episodes)
- Input value of epsilon in the console
- Bug fix(not listed)
- Q-Table training sucess rate calculation
30 changes: 21 additions & 9 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,21 @@
render_mode="human", is_slippery=False)

# Hyperparameters
episodes = 100 # Total of episodes
#episodes = 100 # Total of episodes
# Handle ValueError exception for float conversion of input
try:
episodes_value = float(input("Enter the number of episodes for this training session: "))
if episodes_value > 0:
episodes = episodes_value
else:
print("Please enter a positive number:")
except ValueError:
print("Please enter a number:")

alpha = 0.5 # Learning Rate
gamma = 0.9 # Discount factor
epsilon = 1.0 # Amount of randomness in the action selection
epsilon_decay = 0.01 # Fixed amount to decrease
epsilon_decay = 1/int(episodes) # Fixed amount to decrease

# Datas
nb_success = 0 # Number of success
Expand Down Expand Up @@ -57,12 +67,13 @@
#Q-tbale calculation
qtable = np.zeros((env.observation_space.n, env.action_space.n))
# show the Q-table
print(" ")
print('Q-table before training: ')
print(qtable)
print(' ')

# Learning loop
for episode in range(episodes):
for episode in range(int(episodes)):
sequence = [] # List of states in the episode
state = env.reset() # Reset the environment
done = False
Expand Down Expand Up @@ -192,6 +203,7 @@
print(" ")
print("Test of the updated Q-Table")
#re-initialize the data
episodes = 100
best_sequence = []
longest_sequence = []
longest_best_sequence = []
Expand All @@ -202,7 +214,7 @@
reward_sequence = []
nb_success = 0
epsilon = 1.0 # same it doesn't used but we nerver know
for episode in range(100):
for episode in range(episodes):
sequence = [] # List of states in the episode
state = env.reset() # Reset the environment
done = False
Expand Down Expand Up @@ -269,7 +281,7 @@
print(" ")

# Results of the Q-table after training and a test without update
print("Results after " + str(episodes) + " episodes: ")
print("Results after " + str(int(episodes)) + " episodes: ")
print(" ")
print(qtable)
print(" ")
Expand Down Expand Up @@ -299,15 +311,15 @@
sequence_words = [action_words[action] for action in sequence] # Convert actions input number into input words
print(f"Sequence {episode_num}: {sequence} / {sequence_words}")
print(" ")
print (f"Success rate = {(nb_success/episodes)*100}%")
print (f"Success rate = {(nb_success/int(episodes))*100}%")
#Success rate of the update of the Q-table
if (nb_success/episodes)*100 == 100 :
if (nb_success/int(episodes))*100 == 100 :
print(" ")
print("The Update of the Q-Table is PERFECT!")
if 80 <= (nb_success/episodes)*100 <= 99 :
if 80 <= (nb_success/int(episodes))*100 <= 99 :
print(" ")
print("The Update of the Q-Table is a great success!")
if 50 <= (nb_success/episodes)*100 <= 79:
if 50 <= (nb_success/int(episodes))*100 <= 79:
print(" ")
print("The Update of the Q-Table is successful!")
if 33 <= (nb_success/episodes)*100 <= 49:
Expand Down

0 comments on commit 8192811

Please sign in to comment.