Continuous Control Project -- Reacher

Background

For this Udacity project, we are tasked with training an autonomous agent that controls double-jointed robotic arms with a goal of placing and keeping the agent's hand in the correct location. The environment is a custom Unity game developed jointly by Unity and Udacity for the Deep Reinforcement Learning nanodegree called Reacher.

The environment was solved in 180 episodes using a DDPG algorithm.


Training results

Software Requirements

A conda environment yaml file is provided, but is not guaranteed to work. Installation instructions are provided below and are recommended if the conda environment fails.

This project was completed on MacOS Big Sur, v11.5, with python==3.6.12, unityagents==0.4.0, and torch==0.4.0.

There are two versions of the environment. Version 1 includes one agent. Version 2 includes Twenty Agents. Download the Unity environment for your proper OS:

Version 1: One Agent

Version 2: Twenty Agents

The Environment


The Reacher environment captured by the Udacity team.

The Reacher environment is a custom Unity game developed by the Unity and Udacity teams. It is a standalone application that can be run on all major operating systems.

The rewards in this environment are +0.1 for every step that the robotic arm is within the target area. At the end of the robotic arm is a small blue node. The target is a large, blue, semi-translucent sphere that rotates around the arm.

There are 20 agents in this environment. Each agent's states are 33 dimensions corresponding to the position, rotation, velocity, and angular velocities of the arm.

The environment uses an action space of 4, corresponding to the torque of the two arm joints. These torques must be between [-1, 1].

The environment is considered solved when an agent obtains an average score of +30 over 100 runs.

Installation

The README has instructions for installing dependencies or downloading needed files.

First, please download the appropriate Reacher environment from the links above.

Second, build your environment. I would first try to install the conda environment using the included environment.yml file although I expect it will fail. If so, follow the next instructions.

Dependencies

Create an environment using Python 3.6:

Linux or Mac:

conda create --name drlnd python=3.6
source activate drlnd

(Optional but recommended) Install OpenAI Gym

pip install gym

Install unityagents

pip install unityagents==0.4.0

Install PyTorch 0.4.0

pip install torch==0.4.0.

Install Jupyter

conda install -c conda-forge notebook

Create an IPython kernel for use with your Jupyter notebook

python -m ipykernel install --user --name drlnd --display-name "drlnd"
This step will allow you to use your conda environment as your kernel in Jupyter

How to Use

There are 4 main files within this effort: Reacher.app, Continuous_Control.ipynb, ddpg_agent.py, and model.py.

Reacher.app is the Unity environment file and will be named differently for Linux or Windows environments
Continuous_Control.ipynb is the notebook where your code will be executed from
ddpg_agent.py is the file that contains the function for the DDPG algorithm. This function calls the model defined in model.py. It also contains the hyperparameters that can be set, including:
- BUFFER_SIZE: replay buffer size
- BATCH_SIZE: minibatch size
- GAMMA: discount factor
- TAU: soft update of target parameters
- LR_ACTOR: learning rate of the actor network
- LR_CRITIC: learning rate of the critic network
- WEIGHT_DECAY: L2 regularization
- UPDATE_EVERY : how often network updates
model.py defines the model architecture

Training the model

Define your model architecure in model.py, ensuring your inputs are equal to the state size, and your outputs are equal to the action space.
Update hyperparameters in ddpg_agent.py. The hyperparameters provide a good starting point and solve the environment relatively quickly.
Navigate to the Continuous_Control.ipynb.
In Section 4, you can update some training parameters, including n_episodes (the number of training episodes) and max_t, the max number of steps per episode.
Line 29 and 30, torch.save(..., 'checkpoint.pth'), saves the actor and critic model weights every episode.

A few experiments

First successfully trained agent

128x128x4 network
normal(0,128^-0.5) initialization
batch size: 1024
buffer size: 1e5
actor learning rate: 1e-4
critic learning rate: 1e-3
network updates every: 1 step
number of training episodes: 130
max steps per episode: 1000

This took about 2 hours to train to completion. Technically not solved, but was definitely close at the 130 episode mark (should have trained longer).

Second attempt

128x128x4 network
normal(0,128^-0.5) initialization
batch size: 256
buffer size: 1e5
actor learning rate: 1e-4
critic learning rate: 1e-3
network updates every: 10 steps
number of training episodes: 150
max steps per episode: 1000

This only took about 20 minutes, thanks to the decreased batch size and fewer network updates, but had terrible performance.

Third attempt

128x128x4 network
normal(0,128^-0.5) initialization
batch size: 512
buffer size: 1e6
actor learning rate: 1e-4
critic learning rate: 1e-3
network updates every: 1 step
number of training episodes: 200
max steps per episode: 1000

Fourth attempt

128x128x4 network
xavier_normal initialization
batch size: 512
buffer size: 1e5
actor learning rate: 1e-3
critic learning rate: 1e-3
network updates every: 1 step
number of training episodes: 200
max steps per episode: 1000

A few notes. This agent trained pretty well. I previously played around with a few different setups and found these hyperparameters were training decently well but was relatively slow and the scores were increasing even after 200 episodes. So I increased the actor learning rate by 10 and that did the trick. Python says this took 19 hours to train but there is no way that is true. My math must be off :).

Credits

The Reacher environment, the GIF used above, and the template for the DDPG code was from the Udacity team.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
media		media
.gitignore		.gitignore
Continuous_Control.ipynb		Continuous_Control.ipynb
README.md		README.md
checkpoint_actor.pth		checkpoint_actor.pth
checkpoint_critic.pth		checkpoint_critic.pth
ddpg_agent.py		ddpg_agent.py
environment.yml		environment.yml
model.py		model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Continuous Control Project -- Reacher

Background

Software Requirements

The Environment

Installation

Dependencies

How to Use

Training the model

A few experiments

First successfully trained agent

Second attempt

Third attempt

Fourth attempt

Credits

About

Releases

Packages

Languages

kapalko/DRL-Continuous-Control

Folders and files

Latest commit

History

Repository files navigation

Continuous Control Project -- Reacher

Background

Software Requirements

The Environment

Installation

Dependencies

How to Use

Training the model

A few experiments

First successfully trained agent

Second attempt

Third attempt

Fourth attempt

Credits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages