REINFORCE

Naive implementation of Monte-Carlo Policy-Gradient Control. CartPole-v0 has been used here as the environment.

The algorithm is given below.

There is one trick though. The return, G, is normalized. This helps the algorithm to have numerical stability.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
logs		logs
saved_models		saved_models
saved_videos		saved_videos
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
REINFORCE_algorithm.png		REINFORCE_algorithm.png
requirements.txt		requirements.txt

Provide feedback