Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on the reward structure #647

Open
csreid opened this issue Jan 11, 2025 · 0 comments
Open

Questions on the reward structure #647

csreid opened this issue Jan 11, 2025 · 0 comments

Comments

@csreid
Copy link

csreid commented Jan 11, 2025

Hi!

I'm trying to get a good agent on the highway environment with high-dimension observations and I'm running into some trouble getting it to behave nicely. I'm traing via PPO from stable-baselines3, on 1.9.1, and using this config at the moment (in yml format):

offroad_terminal: true
observation:
  type: TupleObservation
  observation_configs:
    - type: LidarObservation
      cells: 128
      normalize: true
    - type: GrayscaleObservation
      observation_shape: [84,84]
      stack_size: 4
      weights: [0.2989, 0.5870, 0.1140]
action:
  type: ContinuousAction
   speed_range: [0, 60]

Lots of these have come from trying to fix the behavior I've been seeing:

  • Without offroad_terminal set, it just spins very quickly in a circle.
  • With that set, it seemed to converge to driving very quickly backwards.
  • Setting a speed floor of 0 has made it converge to a policy of very quickly stopping.

At that point, I realized I must be doing something fundamentally wrong.

I see here that the reward is scaled for forward-ness; is that with respect to the current road segment?

Is there something else I'm missing? I don't see other folks having this problem, so I'm assuming I've goofed something up somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant