Skip to content

chrisguo/reinforcement-learning-an-introduction

 
 

Repository files navigation

Reinforcement Learning: An Introduction

Python code for Sutton & Barto's book Reinforcement Learning: An Introduction (2nd Edition)

Contents

Click to view the sample output

Chapter 1

  1. Tic-Tac-Toe

Chapter 2

  1. Figure 2.2: Average performance of epsilon-greedy action-value methods on the 10-armed testbed
  2. Figure 2.3: Optimistic initial action-value estimates
  3. Figure 2.4: Average performance of UCB action selection on the 10-armed testbed
  4. Figure 2.5: Average performance of the gradient bandit algorithm
  5. Figure 2.6: A parameter study of the various bandit algorithms

Chapter 3

  1. Figure 3.5: Grid example with random policy
  2. Figure 3.8: Optimal solutions to the gridworld example

Chapter 4

  1. Figure 4.1: Convergence of iterative policy evaluation on a small gridworld
  2. Figure 4.2: Jack’s car rental problem
  3. Figure 4.3: The solution to the gambler’s problem

Chapter 5

  1. Figure 5.1: Approximate state-value functions for the blackjack policy
  2. Figure 5.3: The optimal policy and state-value function for blackjack found by Monte Carlo ES
  3. Figure 5.4: Weighted importance sampling
  4. Figure 5.5: Ordinary importance sampling with surprisingly unstable estimates

Chapter 6

  1. Figure 6.2: Random walk
  2. Figure 6.3: Batch updating
  3. Figure 6.4: Sarsa applied to windy grid world
  4. Figure 6.5: The cliff-walking task
  5. Figure 6.7: Interim and asymptotic performance of TD control methods
  6. Figure 6.8: Comparison of Q-learning and Double Q-learning

Chapter 7

  1. Figure 7.2: Performance of n-step TD methods on 19-state random walk

Chapter 8

  1. Figure 8.3: Average learning curves for Dyna-Q agents varying in their number of planning steps
  2. Figure 8.5: Average performance of Dyna agents on a blocking task
  3. Figure 8.6: Average performance of Dyna agents on a shortcut task
  4. Figure 8.7: Prioritized sweeping significantly shortens learning time on the Dyna maze task

Chapter 9

  1. Figure 9.1: Gradient Monte Carlo algorithm on the 1000-state random walk task
  2. Figure 9.2: Semi-gradient n-steps TD algorithm on the 1000-state random walk task
  3. Figure 9.5: Fourier basis vs polynomials on the 1000-state random walk task
  4. Figure 9.8: Example of feature width’s effect on initial generalization and asymptotic accuracy
  5. Figure 9.10: Single tiling and multiple tilings on the 1000-state random walk task

Chapter 10

  1. Figure 10.1: The cost-to-go function for Mountain Car task in one run
  2. Figure 10.2: Learning curves for semi-gradient Sarsa on Mountain Car task
  3. Figure 10.3: One-step vs multi-step performance of semi-gradient Sarsa on the Mountain Car task
  4. Figure 10.4: Effect of the alpha and n on early performance of n-step semi-gradient Sarsa
  5. Figure 10.5: Differential semi-gradient Sarsa on the access-control queuing task

Chapter 11

  1. Figure 11.2: Baird's Counterexample

Chapter 12

  1. Figure 12.3: Off-line λ-return algorithm on 19-state random walk
  2. Figure 12.6: TD(λ) algorithm on 19-state random walk
  3. Figure 12.7: True online TD(λ) algorithm on 19-state random walk

Extra

  1. Return Specific Importance Sampling
  2. N-Step Tree Backup

Environment

  • Python 2.7
  • Numpy
  • Matplotlib

Contribution

This project contains almost all the programmable figures in the book. However, when I completed this project, the book is still in draft and some chapters are still incomplete. Furthermore, due to the limited computational capacity of my machine, I can only use limited runs and episodes for some experiments, so the sample output is much less smooth than that in the book.

If you want to contribute some exercises of the book or some missing examples, fix some bugs in existing code, provide sample outputs with higher quality, add some new interesting experiments related to RL, feel free to open an issue or make a pull request. I will appreciate it very much. Also, feel free to comment on the sample outputs, some curves are really interesting.

Following are known missing figures/examples:

  • Example 3.4: Pole-Balancing
  • Example 3.6: Draw Poker
  • Example 5.2: Soap Bubble
  • Example 8.5: Rod Maneuvering
  • Chapter 13: One example that hasn't shown up in the book about policy gradient
  • Chapter 14 & 15 are about psychology and neuroscience
  • Chapter 16: Backgammon, The Acrobot, Go

About

Python code for Reinforcement Learning: An Introduction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%