Name		Name	Last commit message	Last commit date
parent directory ..
data		data
old		old
results		results
README.md		README.md
atari_wrappers.py		atari_wrappers.py
dqn.py		dqn.py
dqn_utils.py		dqn_utils.py
hw3_instructions.pdf		hw3_instructions.pdf
logz.py		logz.py
lunar_lander.py		lunar_lander.py
plot.py		plot.py
plot_part1.py		plot_part1.py
requirements.txt		requirements.txt
run_11.sh		run_11.sh
run_12.sh		run_12.sh
run_13.sh		run_13.sh
run_14.sh		run_14.sh
run_21.sh		run_21.sh
run_22.sh		run_22.sh
run_dqn_atari.py		run_dqn_atari.py
train_ac_f18.py		train_ac_f18.py

README.md

CS294-112 HW 3: Q-Learning

Usage

To run all experiments and plot figures for the report, run

bash run_11.sh
bash run_12.sh
bash run_13.sh
bash run_14.sh
python plot_part1.py
bash run_21.sh
bash run_22.sh

Results

Part 1

Question 1

Question 2

Question 3

I experimented the effect of discount factor on performance.

As we can see, it takes longer to converge for small discount factor.

Part 2

Question 1

Setting both num_grad_steps_per_target_update and num_target_updates to 10 works best.

Question 2

Original README

Dependencies:

Python 3.5
Numpy version 1.14.5
TensorFlow version 1.10.5
MuJoCo version 1.50 and mujoco-py 1.50.1.56
OpenAI Gym version 0.10.5
seaborn
Box2D==2.3.2
OpenCV
ffmpeg

Before doing anything, first replace gym/envs/box2d/lunar_lander.py with the provided lunar_lander.py file.

The only files that you need to look at are dqn.py and train_ac_f18.py, which you will implement.

See the HW3 PDF for further instructions.

The starter code was based on an implementation of Q-learning for Atari generously provided by Szymon Sidor from OpenAI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hw3

hw3

README.md

CS294-112 HW 3: Q-Learning

Usage

Results

Part 1

Question 1

Question 2

Question 3

Part 2

Question 1

Question 2

Original README

Files

hw3

Directory actions

More options

Directory actions

More options

Latest commit

History

hw3

Folders and files

parent directory

README.md

CS294-112 HW 3: Q-Learning

Usage

Results

Part 1

Question 1

Question 2

Question 3

Part 2

Question 1

Question 2

Original README