Skip to content

Latest commit

 

History

History
11 lines (5 loc) · 300 Bytes

README.md

File metadata and controls

11 lines (5 loc) · 300 Bytes

REINFORCE

Naive implementation of Monte-Carlo Policy-Gradient Control. CartPole-v0 has been used here as the environment.

The algorithm is given below.

There is one trick though. The return, G, is normalized. This helps the algorithm to have numerical stability.