-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a multi-objective pistonball environment #10
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, only one minor comment
) | ||
self.reward_dim = 3 # [global, local, time] | ||
self.reward_spaces = { | ||
f"piston_{i}": Box(low=-np.inf, high=np.inf, shape=(self.reward_dim,), dtype=np.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it possible to have better bounds on this?
I've added more informative reward bounds with documentation for how I obtained these bounds. If someone could briefly check whether this makes sense that would be good. I also ran random policies with 50 different seeds to verify whether the rewards were indeed within the specified bounds and everything seemed okay. |
I implemented a multi-objective version of Pistonball. This essentially boils down to separating the three components of the original reward function and exposing these as a vector reward instead.