Support importance sampling for off-policy policy gradient methods #26

ziyadedher · 2019-11-26T18:48:49Z

Is your feature request related to a problem? Please describe.
Currently, our policy gradient methods require the memories to be on-policy which is definitely not something we would like to continuously enforce. We should support importance sampling for policy gradient methods to circumvent this.

Describe the solution you'd like
Since we have a field in our transitions that store the behavioral policy, we just need to populate that field and use it in the learning step of policy gradient.

ziyadedher added the enhancement New feature or request label Nov 26, 2019

ziyadedher self-assigned this Nov 26, 2019

ziyadedher added this to the v0.1 milestone Nov 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support importance sampling for off-policy policy gradient methods #26

Support importance sampling for off-policy policy gradient methods #26

ziyadedher commented Nov 26, 2019

Support importance sampling for off-policy policy gradient methods #26

Support importance sampling for off-policy policy gradient methods #26

Comments

ziyadedher commented Nov 26, 2019