Questions on the reward structure #647

csreid · 2025-01-11T21:16:09Z

Hi!

I'm trying to get a good agent on the highway environment with high-dimension observations and I'm running into some trouble getting it to behave nicely. I'm traing via PPO from stable-baselines3, on 1.9.1, and using this config at the moment (in yml format):

offroad_terminal: true
observation:
  type: TupleObservation
  observation_configs:
    - type: LidarObservation
      cells: 128
      normalize: true
    - type: GrayscaleObservation
      observation_shape: [84,84]
      stack_size: 4
      weights: [0.2989, 0.5870, 0.1140]
action:
  type: ContinuousAction
   speed_range: [0, 60]

Lots of these have come from trying to fix the behavior I've been seeing:

Without offroad_terminal set, it just spins very quickly in a circle.
With that set, it seemed to converge to driving very quickly backwards.
Setting a speed floor of 0 has made it converge to a policy of very quickly stopping.

At that point, I realized I must be doing something fundamentally wrong.

I see here that the reward is scaled for forward-ness; is that with respect to the current road segment?

Is there something else I'm missing? I don't see other folks having this problem, so I'm assuming I've goofed something up somewhere.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions on the reward structure #647

Questions on the reward structure #647

csreid commented Jan 11, 2025

Questions on the reward structure #647

Questions on the reward structure #647

Comments

csreid commented Jan 11, 2025