Skip to content

Commit

Permalink
doc(FrozenLake_tuto): update policy exploitation logic to handle vari…
Browse files Browse the repository at this point in the history
…able sets of maximum Q-values (#1037)
  • Loading branch information
edelauna authored Apr 29, 2024
1 parent 93403cb commit 5bf7269
Showing 1 changed file with 4 additions and 6 deletions.
10 changes: 4 additions & 6 deletions docs/tutorials/training_agents/FrozenLake_tuto.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,12 +161,10 @@ def choose_action(self, action_space, state, qtable):
# Exploitation (taking the biggest Q-value for this state)
else:
# Break ties randomly
# If all actions are the same for this state we choose a random one
# (otherwise `np.argmax()` would always take the first one)
if np.all(qtable[state, :]) == qtable[state, 0]:
action = action_space.sample()
else:
action = np.argmax(qtable[state, :])
# Find the indices where the Q-value equals the maximum value
# Choose a random action from the indices where the Q-value is maximum
max_ids = np.where(qtable[state, :] == max(qtable[state, :]))[0]
action = rng.choice(max_ids)
return action


Expand Down

0 comments on commit 5bf7269

Please sign in to comment.