reinforcement_learning/policy_gradient at master · yrlu/reinforcement_learning

History

Name		Name	Last commit message	Last commit date
parent directory ..
imgs		imgs
README.md		README.md
cartpole_reinforce.py		cartpole_reinforce.py
cartpole_reinforce_baseline.py		cartpole_reinforce_baseline.py
reinforce.py		reinforce.py
reinforce_w_baseline.py		reinforce_w_baseline.py
tf_utils.py		tf_utils.py

README.md

Policy Gradient Methods

REINFORCE

The policy function is approximated by a 4-layer fully connected network with l2 regularization. The algorithm solved cartpole-v0 after 632 episodes

reinforce.py: REINFORCE with policy function approximation
cartpole_reinforce.py: working example on cartpole-v0

Run Code

$ python cartpole_reinforce.py

Cartpole-v0 Result

REINFORCE with Baseline

Here the code shows REINFORCE algorithm with baseline. The policy and value function share the same network regularized by l2. Have not been tuning the hyperparameters too much. Sometimes the model quickly converges to a local optimal (degenerate policy) due to random initialization, but a few attempts (<5) should be sufficient.

reinforce_w_baseline.py: REINFORCE with baseline
cartpole_reinforce_baseline.py: working example on cartpole-v0

Run Code

$ python cartpole_reinforce_baseline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

policy_gradient

policy_gradient

README.md

Policy Gradient Methods

REINFORCE

Run Code

Cartpole-v0 Result

REINFORCE with Baseline

Run Code

Cartpole-v0 Result

Files

policy_gradient

Directory actions

More options

Directory actions

More options

Latest commit

History

policy_gradient

Folders and files

parent directory

README.md

Policy Gradient Methods

REINFORCE

Run Code

Cartpole-v0 Result

REINFORCE with Baseline

Run Code

Cartpole-v0 Result