You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently, our policy gradient methods require the memories to be on-policy which is definitely not something we would like to continuously enforce. We should support importance sampling for policy gradient methods to circumvent this.
Describe the solution you'd like
Since we have a field in our transitions that store the behavioral policy, we just need to populate that field and use it in the learning step of policy gradient.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Currently, our policy gradient methods require the memories to be on-policy which is definitely not something we would like to continuously enforce. We should support importance sampling for policy gradient methods to circumvent this.
Describe the solution you'd like
Since we have a field in our transitions that store the behavioral policy, we just need to populate that field and use it in the learning step of policy gradient.
The text was updated successfully, but these errors were encountered: