You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am training a reinforcement learning agent with one dimensional continuous action space. I want to restrict the action space, such that the agent can only choose valid actions (This constraint changes depending on the current state).
Therefore I tried two approaches to limiting the action space:
A cvxpy layer, solving the box constraint.
Forward pass and gradient are the same, which I double-checked by plotting the gradients, too.
To my surprise, however, they yield completely different results: The torch.clamp version does not converge to a good solution, while the cvxpylayer does.
Is there any explanation for this?
All the best,
Philipp
The gradient of both functions
Forward pass of both functions
The text was updated successfully, but these errors were encountered:
Hi everybody,
I am training a reinforcement learning agent with one dimensional continuous action space. I want to restrict the action space, such that the agent can only choose valid actions (This constraint changes depending on the current state).
Therefore I tried two approaches to limiting the action space:
https://pytorch.org/docs/stable/generated/torch.clamp.html
Forward pass and gradient are the same, which I double-checked by plotting the gradients, too.
To my surprise, however, they yield completely different results: The torch.clamp version does not converge to a good solution, while the cvxpylayer does.
Is there any explanation for this?
All the best,
Philipp
The gradient of both functions
Forward pass of both functions
The text was updated successfully, but these errors were encountered: