Q Overestimation #16

smorad · 2023-04-02T23:26:15Z

I'm rerunning velocity baselines in the POMDP directory and I'm observing exploding Q values fairly often. I was wondering if this is something you experienced during training. TD3 seems to avoid overestimation bias but the returns seem low. Any tips to get more stable returns across trials without massive batch sizes?

twni2016 · 2023-04-05T00:00:23Z

Yes, I found overestimation and also gradient explosion when training LSTM TD3 in some hard environments like Walker-V. A simple remedy may be add gradient clipping to avoid explosion, although I don't expect this can fix the issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q Overestimation #16

Q Overestimation #16

smorad commented Apr 2, 2023

twni2016 commented Apr 5, 2023

Q Overestimation #16

Q Overestimation #16

Comments

smorad commented Apr 2, 2023

twni2016 commented Apr 5, 2023