-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potentially wrong reward #237
Comments
@Max-Fu I think there has not been a lot of test and tuning of that reward function. Please submit a PR if you can improve the current version |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Inside the gym environment, there are two robot speed: self.speed and self.robot_speed; while self.robot_speed is set to a constant, self.speed is the true speed. Yet in the reward function, the function calls self.robot_speed instead of self.speed (check this). I think this creates the reward mis-specification problem (i.e. DDPG learns trivial policy). Can one of the repo creators check if this is indeed an error? Thanks! (I just restarted my run and will check if this solve the issue.)
The text was updated successfully, but these errors were encountered: