Skip to content

Udacity DRL Project 2 - Continuous Control robotic arms

Notifications You must be signed in to change notification settings

kapalko/DRL-Continuous-Control

Repository files navigation

Continuous Control Project -- Reacher

Background

For this Udacity project, we are tasked with training an autonomous agent that controls double-jointed robotic arms with a goal of placing and keeping the agent's hand in the correct location. The environment is a custom Unity game developed jointly by Unity and Udacity for the Deep Reinforcement Learning nanodegree called Reacher.

The environment was solved in 180 episodes using a DDPG algorithm.

Solved!
Training results

Software Requirements

A conda environment yaml file is provided, but is not guaranteed to work. Installation instructions are provided below and are recommended if the conda environment fails.

This project was completed on MacOS Big Sur, v11.5, with python==3.6.12, unityagents==0.4.0, and torch==0.4.0.

There are two versions of the environment. Version 1 includes one agent. Version 2 includes Twenty Agents. Download the Unity environment for your proper OS:

Version 1: One Agent

Version 2: Twenty Agents

The Environment

Reacher Game
The Reacher environment captured by the Udacity team.

The Reacher environment is a custom Unity game developed by the Unity and Udacity teams. It is a standalone application that can be run on all major operating systems.

The rewards in this environment are +0.1 for every step that the robotic arm is within the target area. At the end of the robotic arm is a small blue node. The target is a large, blue, semi-translucent sphere that rotates around the arm.

There are 20 agents in this environment. Each agent's states are 33 dimensions corresponding to the position, rotation, velocity, and angular velocities of the arm.

The environment uses an action space of 4, corresponding to the torque of the two arm joints. These torques must be between [-1, 1].

The environment is considered solved when an agent obtains an average score of +30 over 100 runs.

Installation

The README has instructions for installing dependencies or downloading needed files.

First, please download the appropriate Reacher environment from the links above.

Second, build your environment. I would first try to install the conda environment using the included environment.yml file although I expect it will fail. If so, follow the next instructions.

Dependencies

  1. Create an environment using Python 3.6:
  • Linux or Mac:
conda create --name drlnd python=3.6
source activate drlnd
  1. (Optional but recommended) Install OpenAI Gym
  • pip install gym
  1. Install unityagents
  • pip install unityagents==0.4.0
  1. Install PyTorch 0.4.0
  • pip install torch==0.4.0.
  1. Install Jupyter
  • conda install -c conda-forge notebook
  1. Create an IPython kernel for use with your Jupyter notebook
  • python -m ipykernel install --user --name drlnd --display-name "drlnd"
  • This step will allow you to use your conda environment as your kernel in Jupyter

How to Use

There are 4 main files within this effort: Reacher.app, Continuous_Control.ipynb, ddpg_agent.py, and model.py.

  • Reacher.app is the Unity environment file and will be named differently for Linux or Windows environments
  • Continuous_Control.ipynb is the notebook where your code will be executed from
  • ddpg_agent.py is the file that contains the function for the DDPG algorithm. This function calls the model defined in model.py. It also contains the hyperparameters that can be set, including:
    • BUFFER_SIZE: replay buffer size
    • BATCH_SIZE: minibatch size
    • GAMMA: discount factor
    • TAU: soft update of target parameters
    • LR_ACTOR: learning rate of the actor network
    • LR_CRITIC: learning rate of the critic network
    • WEIGHT_DECAY: L2 regularization
    • UPDATE_EVERY : how often network updates
  • model.py defines the model architecture

Training the model

  1. Define your model architecure in model.py, ensuring your inputs are equal to the state size, and your outputs are equal to the action space.

  2. Update hyperparameters in ddpg_agent.py. The hyperparameters provide a good starting point and solve the environment relatively quickly.

  3. Navigate to the Continuous_Control.ipynb.

  4. In Section 4, you can update some training parameters, including n_episodes (the number of training episodes) and max_t, the max number of steps per episode.

  5. Line 29 and 30, torch.save(..., 'checkpoint.pth'), saves the actor and critic model weights every episode.

A few experiments

First successfully trained agent

  • 128x128x4 network
  • normal(0,128^-0.5) initialization
  • batch size: 1024
  • buffer size: 1e5
  • actor learning rate: 1e-4
  • critic learning rate: 1e-3
  • network updates every: 1 step
  • number of training episodes: 130
  • max steps per episode: 1000

This took about 2 hours to train to completion. Technically not solved, but was definitely close at the 130 episode mark (should have trained longer).

Second attempt

  • 128x128x4 network
  • normal(0,128^-0.5) initialization
  • batch size: 256
  • buffer size: 1e5
  • actor learning rate: 1e-4
  • critic learning rate: 1e-3
  • network updates every: 10 steps
  • number of training episodes: 150
  • max steps per episode: 1000

This only took about 20 minutes, thanks to the decreased batch size and fewer network updates, but had terrible performance.

Third attempt

  • 128x128x4 network
  • normal(0,128^-0.5) initialization
  • batch size: 512
  • buffer size: 1e6
  • actor learning rate: 1e-4
  • critic learning rate: 1e-3
  • network updates every: 1 step
  • number of training episodes: 200
  • max steps per episode: 1000

Fourth attempt

  • 128x128x4 network
  • xavier_normal initialization
  • batch size: 512
  • buffer size: 1e5
  • actor learning rate: 1e-3
  • critic learning rate: 1e-3
  • network updates every: 1 step
  • number of training episodes: 200
  • max steps per episode: 1000

A few notes. This agent trained pretty well. I previously played around with a few different setups and found these hyperparameters were training decently well but was relatively slow and the scores were increasing even after 200 episodes. So I increased the actor learning rate by 10 and that did the trick. Python says this took 19 hours to train but there is no way that is true. My math must be off :).

Credits

The Reacher environment, the GIF used above, and the template for the DDPG code was from the Udacity team.

About

Udacity DRL Project 2 - Continuous Control robotic arms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published