Skip to content

Commit

Permalink
Version 1.2.3 (Check the “Patch Note.md” file in the “Doc” folder for…
Browse files Browse the repository at this point in the history
… more informations about the update)
  • Loading branch information
VOCdevShy committed Apr 2, 2024
1 parent 271ece8 commit 2cabd77
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 15 deletions.
2 changes: 1 addition & 1 deletion Doc/Patch Note.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Version 1.2.0 (major update) date: 03/03/2024 at: 2:20 p.m (CET(UTC+1)):
- Correction of the launch of the test of the updated Q-Table

Version 1.2.1 (minor update) date: 14/03/2024 at: 5:50 p.m (CET(UTC+1)):
- Calculation of espsilon decay (calcul detail: 1/episodes)
- Calculation of espsilon decay (calcul detail: `1/episodes`)
- Input value of epsilon in the console
- Bug fix(not listed)
- Q-Table training sucess rate calculation
Expand Down
4 changes: 1 addition & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ The list of predefined maps are in the map files in the tools folder. Here you c

If you want more information about Q-Learning and the Frozen Lake game, please read the article from medium, he help me a lot to understand what to do in the code: https://medium.com/towards-data-science/q-learning-for-beginners-2837b777741

Do your own test by moving values if you want!

For those who are interested by the calculation of the Q-Table here is an explication:

`qtable[state, action] = qtable[state, action] + alpha * (reward + gamma * np.max(qtable[next_state, :]) - qtable[state, action])`
Expand All @@ -17,7 +15,7 @@ For those who are interested by the calculation of the Q-Table here is an explic

- `alpha`: This is the learning rate. It controls the extent to which new information will be integrated into the old values of the Q-table. A high value means that new information will have a greater impact on existing values, while a low value means they will have a lesser impact.

- `reward`: This is the immediate reward obtained after taking action 'action' in state 'state'. This reward can be positive, negative, or zero.
- `reward`: This is the immediate reward obtained after taking action in state . This reward is equals to a postive float.

- `gamma`: This is the discount factor. It represents the importance of future rewards compared to immediate rewards. A gamma close to 1 gives great importance to future rewards, while a gamma close to 0 gives similar importance to all rewards, whether immediate or future.

Expand Down
21 changes: 10 additions & 11 deletions main.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import time
import warnings

import gym
from gym.envs.toy_text.frozen_lake import generate_random_map # To generate a random map if you want
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import clear_output
Expand Down Expand Up @@ -139,8 +139,8 @@
elif len(sequence) > len(longest_best_sequence):
longest_best_sequence = sequence

if best_sequence == sequence:
recurent_sequence = recurent_sequence + 1
if best_sequence == sequence:
recurent_sequence = recurent_sequence + 1

epsilon = max(epsilon - epsilon_decay, 0)
clear_output(wait=True)
Expand Down Expand Up @@ -200,7 +200,6 @@
print(" ")

if test == "n":
print(" ")
plt.figure(figsize=(3, 1.25))
plt.xlabel("Run number")
plt.ylabel("Outcome")
Expand All @@ -211,7 +210,6 @@

# Loop for the test of the updated Q-Table
if test == "y":
print(" ")
print("Test of the updated Q-Table")
print(" ")
#re-initialize the data
Expand Down Expand Up @@ -239,10 +237,9 @@
action = env.action_space.sample()
# If there's no best action (only zeros), take a random one
if np.max(qtable[state]) > 0:
if np.argmax(qtable[state]) == 0:
action = env.action_space.sample()
else:
action = np.argmax(qtable[state])
if np.argmax(qtable[state]) == 0:
action = env.action_space.sample()
sequence.append(action)

next_state, reward, done, info, _ = env.step(action)
Expand All @@ -263,6 +260,8 @@
if reward:
outcome[-1] = "Success"
reward_counter = reward_counter + 1
reward_episode.append(episode)
reward_sequence.append(sequence)
if not best_sequence:
best_sequence = sequence
elif len(sequence) < len(best_sequence):
Expand All @@ -272,8 +271,8 @@
longest_best_sequence = sequence
elif len(sequence) > len(longest_best_sequence):
longest_best_sequence = sequence
if best_sequence == sequence:
recurent_sequence = recurent_sequence + 1
if best_sequence == sequence:
recurent_sequence = recurent_sequence + 1
if best_sequence == []:
recurent_sequence = 0

Expand Down Expand Up @@ -328,7 +327,7 @@
#Success rate of the update of the Q-table
if (nb_success / int(episodes)) * 100 == 100:
print(" ")
print("The Update of the Q-Table is PERFECT!")
print("The Update of the Q-Table is PERFECT to reach the goal!")
if 80 <= (nb_success / int(episodes)) * 100 <= 99:
print(" ")
print("The Update of the Q-Table is a great success!")
Expand Down

0 comments on commit 2cabd77

Please sign in to comment.