doc(FrozenLake_tuto): update policy exploitation logic to handle variable sets of maximum Q-values #1037

edelauna · 2024-04-26T02:51:06Z

Description

Small change to the logic in the docs/tutorial/FrozenLake_tuto.py, the existing exploitation logic only randomly chooses an action if all actions have the same q-value. Whereas it's possible for a subset of actions to have a maximum q-value.

Updating the logic to retrieve an index of q-values which are equal to the max value, and then randomly selecting an action to take.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

Documentation only change (no code changed)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

…le sets of maximum Q-values

pseudo-rnd-thoughts

Personally, I understand the issue with just np.argmax but this solution looks good

feat(EpsilonGreedy): update policy exploitaion logic to handle variab…

d8f2bce

…le sets of maximum Q-values

edelauna changed the title ~~doc(EpsilonGreedy): update policy exploitaion logic to handle variable sets of maximum Q-values~~ doc(FrozenLake_tuto): update policy exploitaion logic to handle variable sets of maximum Q-values Apr 26, 2024

edelauna changed the title ~~doc(FrozenLake_tuto): update policy exploitaion logic to handle variable sets of maximum Q-values~~ doc(FrozenLake_tuto): update policy exploitation logic to handle variable sets of maximum Q-values Apr 26, 2024

edelauna marked this pull request as ready for review April 26, 2024 02:52

pseudo-rnd-thoughts approved these changes Apr 29, 2024

View reviewed changes

pseudo-rnd-thoughts merged commit 5bf7269 into Farama-Foundation:main Apr 29, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc(FrozenLake_tuto): update policy exploitation logic to handle variable sets of maximum Q-values #1037

doc(FrozenLake_tuto): update policy exploitation logic to handle variable sets of maximum Q-values #1037

edelauna commented Apr 26, 2024

pseudo-rnd-thoughts left a comment

doc(FrozenLake_tuto): update policy exploitation logic to handle variable sets of maximum Q-values #1037

doc(FrozenLake_tuto): update policy exploitation logic to handle variable sets of maximum Q-values #1037

Conversation

edelauna commented Apr 26, 2024

Description

Type of change

Screenshots

Checklist:

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment