Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix: RM transitions #51

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

tdelgado00
Copy link

@tdelgado00 tdelgado00 commented Feb 13, 2024

Description

This pull request allows two states of a reward machine to be connected with multiple transitions with different rewards (as long as the different transitions have different formulas). In particular, since all terminal states are mapped to -1, this allows an RM state to transition to a terminal state with two different rewards depending on the formula that was satisfied. This solves a bug present in some of the environments.

This should close issue #50.

Changes

  • Changed the transition functions in the reward machine class so that they are maps of (state, formula) -> new state/reward instead of (state, state) -> formula/reward, and updated the corresponding references.
  • Added a set of terminal states to the reward machine class. With this, we don't map all terminal states to terminal_u anymore. This was necessary to make HRM find the right set of options.
  • Tested the office gridworld multi-task and single-task. In the single-task case, task 4 (get the coffee and email and go to the office) isn't a good example of the difference between HRM and CRM. I added entry points for the 4 tasks in the single-task version, so they can be run by selecting env=Office-single-Ti-v0 with i=1, 2, 3, 4. The difference between CRM and HRM can now be seen for example with T1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant