Confused about PPO update #20

Githuber-zwb · 2024-06-06T03:28:06Z

I'm a bit confused about the PPO update process. In the line 110:

The rewards in a single episode are normalized by subtracting the mean while divided by the variance. So why should the rewards be scaled? I found that though normalized, some truly bad rewards are scaled and important information is lost.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confused about PPO update #20

Confused about PPO update #20

Githuber-zwb commented Jun 6, 2024

Confused about PPO update #20

Confused about PPO update #20

Comments

Githuber-zwb commented Jun 6, 2024