Hill-Climbing Algorithms Comparison for Open AI 'Cartpole-v0' environment

Introduction

This comparison tries to show some differences of different Agents with Hill Climbing algorithms and their performance in "Cartpole-v0" Environment.

The Results

The best algorithm modification (out of the ones considered for this particular Environment) was Hill Climbing with Adaptive Noise:

It took the least amount of time;
It Achieved the best accuracy;
It was the most stable.

'* - More details in report.ipynb and comparison.xlsx

The Environment

"""
Description:
    A pole is attached by an un-actuated joint to a cart, which moves along
    a frictionless track. The pendulum starts upright, and the goal is to
    prevent it from falling over by increasing and reducing the cart's
    velocity.
Source:
    This environment corresponds to the version of the cart-pole problem
    described by Barto, Sutton, and Anderson
Observation:
    Type: Box(4)
    Num     Observation               Min                     Max
    0       Cart Position             -4.8                    4.8
    1       Cart Velocity             -Inf                    Inf
    2       Pole Angle                -0.418 rad (-24 deg)    0.418 rad (24 deg)
    3       Pole Angular Velocity     -Inf                    Inf
Actions:
    Type: Discrete(2)
    Num   Action
    0     Push cart to the left
    1     Push cart to the right
    Note: The amount the velocity that is reduced or increased is not
    fixed; it depends on the angle the pole is pointing. This is because
    the center of gravity of the pole increases the amount of energy needed
    to move the cart underneath it
Reward:
    Reward is 1 for every step taken, including the termination step
Starting State:
    All observations are assigned a uniform random value in [-0.05..0.05]
Episode Termination:
    Pole Angle is more than 12 degrees.
    Cart Position is more than 2.4 (center of the cart reaches the edge of
    the display).
    Episode length is greater than 200.
    Solved Requirements:
    Considered solved when the average return is greater than or equal to
    195.0 over 100 consecutive trials.
"""

Replicate the Rusults

Step 1: Clone the DRLND Repository

Follow the instructions in the DRLND GitHub repository to set up your Python environment. These instructions can be found in README.md at the root of the repository. By following these instructions, you will install PyTorch, the ML-Agents toolkit, and a few more Python packages required to complete the project.

(For Windows users) The ML-Agents toolkit supports Windows 10. While it might be possible to run the ML-Agents toolkit using other versions of Windows, it has not been tested on other versions. Furthermore, the ML-Agents toolkit has not been tested on a Windows VM such as Bootcamp or Parallels.

Step 2: Run Notebooks

Change the Agents policy to stochastic or deterministic by commenting out option 1 or option 2 lines in "2. Define the Policy" cell.

Notebooks - (orientational time to do 1000 runs with deterministic policy) :

Hill_Climbing-Vanilla.ipynb - (90 min)
Hill_Climbing-Vanilla-with Annealing.ipynb - (100 min)
Hill_Climbing-Vanilla-with Adaptive Noise.ipynb - (18 min)
Hill_Climbing-Steepest_Hill.ipynb - (240 min)
Hill_Climbing-Steepest_Hill-with_Annealing.ipynb - (200 min)
Hill_Climbing-Steepest_Hill-with_Adaptive_noise.ipynb - (180 min)

Further Modifications

Compare results by changing max_t parameter (adjusting it by learning progress might improve comparison efficiency):
- the Steepest Hill with Adaptive Noise algorithm trains very closely to 1000 steps mark (increase suggested);
- some algorithms never learn after 100 steps (especially in stochastic Policy case);
Compare results by changing noise_scale parameter:
- some algorithms are more sensitive to this parameter than others;
Compare results by changing noise_decay parameter:
- algorithms that have this parameter may train faster;
- algorithms that have this parameter may learn more or less robust Policy (especially Adaptive Noise);
Compare results by changing n_cand parameter:
- the Steepest Hill algorithms may perform better or worse with less or more candidates;
- n_cand vs. n_episodes how this dynamic changes results;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hill-Climbing Algorithms Comparison for Open AI 'Cartpole-v0' environment

Introduction

The Results

The Environment

Replicate the Rusults

Step 1: Clone the DRLND Repository

Step 2: Run Notebooks

Further Modifications

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
data		data
.gitignore		.gitignore
Comparison.png		Comparison.png
Hill_Climbing-Steepest_Hill-with_Adaptive_noise.ipynb		Hill_Climbing-Steepest_Hill-with_Adaptive_noise.ipynb
Hill_Climbing-Steepest_Hill-with_Annealing.ipynb		Hill_Climbing-Steepest_Hill-with_Annealing.ipynb
Hill_Climbing-Steepest_Hill.ipynb		Hill_Climbing-Steepest_Hill.ipynb
Hill_Climbing-Vanilla-with Adaptive Noise.ipynb		Hill_Climbing-Vanilla-with Adaptive Noise.ipynb
Hill_Climbing-Vanilla-with Annealing.ipynb		Hill_Climbing-Vanilla-with Annealing.ipynb
Hill_Climbing-Vanilla.ipynb		Hill_Climbing-Vanilla.ipynb
README.md		README.md
cartpole_v0.png		cartpole_v0.png
comparison.xlsx		comparison.xlsx
report.ipynb		report.ipynb
times.png		times.png

arvyzukai/hill-climbing-variations

Folders and files

Latest commit

History

Repository files navigation

Hill-Climbing Algorithms Comparison for Open AI 'Cartpole-v0' environment

Introduction

The Results

The Environment

Replicate the Rusults

Step 1: Clone the DRLND Repository

Step 2: Run Notebooks

Further Modifications

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages