Inconsistency in solved return for Humanoid environment #456
Replies: 2 comments
-
Hi @LabChameleon , Looking at Figure 4, the environments are trained with 1e6 steps to 6-8k reward, comparing Brax v1 and MuJoCo. Brax has changed considerably since then and I expect that a similar analysis and comparison would need to be done to find an average return that "solves" the environment in the current Brax version (and also for each physics backend). We don't have a particular number lying around for this currently. @cdfreeman-google may have better insights here. |
Beta Was this translation helpful? Give feedback.
-
Hi @btaba, thanks for your reply and for moving this thread to Discussions. For Humanoid I get rewards around 12100-12400 that solve the environment qualitatively (running with a basic arm-swing motion). I was irritated by the number of 13k in the tutorial then. For a paper, I was hoping I could give certain numbers at which the environments are considered to be solved. However, then I will limit myself to writing that the policies qualitatively solve environments. I would still be very interested if anyone has more insights here though. Thanks again! |
Beta Was this translation helpful? Give feedback.
-
Hi!
is there an agreed threshold when the humanoid environment is considered to be solved? In the Brax paper, they refer to the environment as solved with an average return of about 12000 as can be seen in Figure 9. In the Brax training tutorial, a threshold of 13000 is stated as given in https://github.com/google/brax/blob/a89322496dcb07ac5a7e002c2e1d287c8c64b7dd/notebooks/training.ipynb#L261
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions