Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arcade-style rewards system #27

Open
bzier opened this issue Nov 12, 2017 · 1 comment
Open

Arcade-style rewards system #27

bzier opened this issue Nov 12, 2017 · 1 comment

Comments

@bzier
Copy link
Owner

bzier commented Nov 12, 2017

Idea for an alternate rewards system:

Old school arcade game style checkpoint system

  • Every step is still -1 reward

    any positive reward for steps will likely encourage delays rather than shortest/fastest path - e.g. driving around aimlessly to accumulate reward

  • Agent only has x steps to play (some reasonably small number)
  • Hitting a checkpoint extends play for another y steps
  • Checkpoints also grant a 'large' sum of reward points

    note that the checkpoint reward needs to be enough to make progress worthwhile; if an episode is lengthened, more steps result in a lower total reward; the checkpoints must offset this or the agent may learn to maximize reward by simply avoiding the first checkpoint and never extending an episode

In theory, this system should/could:

  • reduce overall episode lengths (by terminating early if no progress is being made), which allows for shorter/faster iterations (i.e. fail fast)
  • reduce getting stuck in any one place for too long
  • prevent driving backwards too far
  • still encourage forward progress with checkpoints as before
@bzier bzier changed the title Alternate rewards system Arcade-style rewards system Nov 16, 2017
@bzier
Copy link
Owner Author

bzier commented Feb 21, 2021

Depends on #26. I would like to see the reward functions injectable/pluggable to facilitate experimentation with all sorts of variations. Before working on this implementation, there should be a way to easy swap out which reward function should be used.

@bzier bzier added this to the Milestone 4 - Extras milestone Feb 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant