Comparing Trust Region Policy Optimization and Natural Policy Gradient in Reinforcement Learning

The core implementation is contained in the following files:

actor.py - implements the actor class

algorithms.py - implements TRPO and NPG, as well as target algorithms (MC returns and GAE)

experiment_class - implements an experiment class and wrapper class to run experiments with different parameters more smoothly

utils.py - implements useful functions, such as sampling memory or updating pytorch model parameters.

Minimal working examples can be found in

min_example_actor.py

demo.py

experiment_demo.py

For the results, we refer to the actual report

Report.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
CartPole_v1/trpo/BaselineCriticMC		CartPole_v1/trpo/BaselineCriticMC
__pycache__		__pycache__
demo		demo
kl		kl
pendulum		pendulum
.gitignore		.gitignore
README.md		README.md
Report.pdf		Report.pdf
acrobot_trpo_baselineMC.py		acrobot_trpo_baselineMC.py
actor.py		actor.py
algorithms.py		algorithms.py
cartpole_trpo_baselineMC.py		cartpole_trpo_baselineMC.py
correlation_return_step_size.py		correlation_return_step_size.py
demo.py		demo.py
demo_experiment.py		demo_experiment.py
experiment_class.py		experiment_class.py
kl_experiment.py		kl_experiment.py
kl_experiment_acrobot.py		kl_experiment_acrobot.py
kl_experiment_cartpole.py		kl_experiment_cartpole.py
kl_experiment_pendulum.py		kl_experiment_pendulum.py
min_example_actor.py		min_example_actor.py
pend_npg_GAE.py		pend_npg_GAE.py
pend_trpo_GAE.py		pend_trpo_GAE.py
plot_cartpole_stepsize.py		plot_cartpole_stepsize.py
plot_pendulum_stepsize_long.py		plot_pendulum_stepsize_long.py
utils.py		utils.py

Provide feedback