Pytorch implementation of popular deep reinforcement learning algorithms towards SOA performance.
Implemented algorithms:
- Proximal Policy Optimization (PPO)
- Deep Deterministic Policy Gradient (DDPG)
To be implemented algorithms:
- Trust Region Policy Optimization (TRPO)
- Generative Adversatial Imitation Learning (GAIL)
- (Double/Dueling) Deep Q-Learning (DQN)
- Python 3.6
- Numpy 1.15
- Scipy 1.1.0
- Mujoco-py 0.5.7
- Gym 0.9.0
- sklearn 0.0
- PyTorch v0.4.0
cd ppo
python ppo_train.py --e Reacher-v1 -n 60000 -b 50
python ppo_train.py --e InvertedPendulum-v1
python ppo_train.py --e InvertedDoublePendulum-v1 -n 12000
python ppo_train.py --e Swimmer-v1 -n 2500 -b 5
python ppo_train.py --e Hopper-v1 -n 30000
python ppo_train.py --e HalfCheetah-v1 -n 3000 -b 5
python ppo_train.py --e Walker2d-v1 -n 25000
python ppo_train.py --e Ant-v1 -n 100000
python ppo_train.py --e Humanoid-v1 -n 200000
python ppo_train.py --e HumanoidStandup-v1 -n 200000 -b 5
cd ddpg
python ddpg_train.py --e Reacher-v1 --start_timesteps 1000
python ddpg_train.py --e InvertedPendulum-v1 --start_timesteps 1000
python ddpg_train.py --e InvertedDoublePendulum-v1 --start_timesteps 1000
python ddpg_train.py --e Swimmer-v1 --start_timesteps 1000
python ddpg_train.py --e Hopper-v1 --start_timesteps 1000
python ddpg_train.py --e HalfCheetah-v1 --start_timesteps 10000
python ddpg_train.py --e Walker2d-v1 --start_timesteps 1000
python ddpg_train.py --e Ant-v1 --start_timesteps 10000
- Human Level Control through Deep Reinforcement Learning
- Playing Atari with Deep Reinforcement Learning
- Deterministic Policy Gradient Algorithms
- Continuous control with deep reinforcement learning
- High-Dimensional Continuous Control Using Generalized Advantage Estimation
- Trust Region Policy Optimization
- Generative Adversarial Imitation Learning
- Proximal Policy Optimization Algorithms
- Emergence of Locomotion Behaviours in Rich Environments
- Github Repository with a lot helpful implementations: Pat-coady, OpenAI Baselines and Ilya Kostrikov