For this Udacity project, we are tasked with training an autonomous agent that controls double-jointed robotic arms with a goal of placing and keeping the agent's hand in the correct location. The environment is a custom Unity game developed jointly by Unity and Udacity for the Deep Reinforcement Learning nanodegree called Reacher.
The environment was solved in 180 episodes using a DDPG algorithm.
Training results |
A conda environment yaml file is provided, but is not guaranteed to work. Installation instructions are provided below and are recommended if the conda environment fails.
This project was completed on MacOS Big Sur, v11.5, with python==3.6.12
, unityagents==0.4.0
, and torch==0.4.0
.
There are two versions of the environment. Version 1 includes one agent. Version 2 includes Twenty Agents. Download the Unity environment for your proper OS:
Version 1: One Agent
Version 2: Twenty Agents
The Reacher environment captured by the Udacity team. |
The Reacher environment is a custom Unity game developed by the Unity and Udacity teams. It is a standalone application that can be run on all major operating systems.
The rewards in this environment are +0.1 for every step that the robotic arm is within the target area. At the end of the robotic arm is a small blue node. The target is a large, blue, semi-translucent sphere that rotates around the arm.
There are 20 agents in this environment. Each agent's states are 33 dimensions corresponding to the position, rotation, velocity, and angular velocities of the arm.
The environment uses an action space of 4, corresponding to the torque of the two arm joints. These torques must be between [-1, 1].
The environment is considered solved when an agent obtains an average score of +30 over 100 runs.
The README has instructions for installing dependencies or downloading needed files.
First, please download the appropriate Reacher environment from the links above.
Second, build your environment. I would first try to install the conda environment using the included environment.yml
file although I expect it will fail. If so, follow the next instructions.
- Create an environment using Python 3.6:
- Linux or Mac:
conda create --name drlnd python=3.6
source activate drlnd
- (Optional but recommended) Install OpenAI Gym
pip install gym
- Install unityagents
pip install unityagents==0.4.0
- Install PyTorch 0.4.0
pip install torch==0.4.0.
- Install Jupyter
conda install -c conda-forge notebook
- Create an IPython kernel for use with your Jupyter notebook
python -m ipykernel install --user --name drlnd --display-name "drlnd"
- This step will allow you to use your conda environment as your kernel in Jupyter
There are 4 main files within this effort: Reacher.app
, Continuous_Control.ipynb
, ddpg_agent.py
, and model.py
.
Reacher.app
is the Unity environment file and will be named differently for Linux or Windows environmentsContinuous_Control.ipynb
is the notebook where your code will be executed fromddpg_agent.py
is the file that contains the function for the DDPG algorithm. This function calls the model defined inmodel.py
. It also contains the hyperparameters that can be set, including:- BUFFER_SIZE: replay buffer size
- BATCH_SIZE: minibatch size
- GAMMA: discount factor
- TAU: soft update of target parameters
- LR_ACTOR: learning rate of the actor network
- LR_CRITIC: learning rate of the critic network
- WEIGHT_DECAY: L2 regularization
- UPDATE_EVERY : how often network updates
model.py
defines the model architecture
-
Define your model architecure in
model.py
, ensuring your inputs are equal to the state size, and your outputs are equal to the action space. -
Update hyperparameters in
ddpg_agent.py
. The hyperparameters provide a good starting point and solve the environment relatively quickly. -
Navigate to the
Continuous_Control.ipynb
. -
In Section 4, you can update some training parameters, including
n_episodes
(the number of training episodes) andmax_t
, the max number of steps per episode. -
Line 29 and 30,
torch.save(..., 'checkpoint.pth')
, saves the actor and critic model weights every episode.
- 128x128x4 network
- normal(0,128^-0.5) initialization
- batch size: 1024
- buffer size: 1e5
- actor learning rate: 1e-4
- critic learning rate: 1e-3
- network updates every: 1 step
- number of training episodes: 130
- max steps per episode: 1000
This took about 2 hours to train to completion. Technically not solved, but was definitely close at the 130 episode mark (should have trained longer).
- 128x128x4 network
- normal(0,128^-0.5) initialization
- batch size: 256
- buffer size: 1e5
- actor learning rate: 1e-4
- critic learning rate: 1e-3
- network updates every: 10 steps
- number of training episodes: 150
- max steps per episode: 1000
This only took about 20 minutes, thanks to the decreased batch size and fewer network updates, but had terrible performance.
- 128x128x4 network
- normal(0,128^-0.5) initialization
- batch size: 512
- buffer size: 1e6
- actor learning rate: 1e-4
- critic learning rate: 1e-3
- network updates every: 1 step
- number of training episodes: 200
- max steps per episode: 1000
- 128x128x4 network
- xavier_normal initialization
- batch size: 512
- buffer size: 1e5
- actor learning rate: 1e-3
- critic learning rate: 1e-3
- network updates every: 1 step
- number of training episodes: 200
- max steps per episode: 1000
A few notes. This agent trained pretty well. I previously played around with a few different setups and found these hyperparameters were training decently well but was relatively slow and the scores were increasing even after 200 episodes. So I increased the actor learning rate by 10 and that did the trick. Python says this took 19 hours to train but there is no way that is true. My math must be off :).
The Reacher environment, the GIF used above, and the template for the DDPG code was from the Udacity team.