This repository contains the code of the project for the Lunar Landing Continuos v2 by OpenAI Gym
The purpose of this project is to create stability and high performance for Moon Landing using A3C based on Deep Q-Learning, one of the most well-known algorithms for Deep Reinforcement Learning and its improvements, test them on the Lunar Lander environment from OpenAI gym. The results of the experiments show that, while the extensions to the base algorithm improved its performances greatly, the simpleness of the environment prevents us from identifying a clear winner.
Read the paper-like report for the project here.
This file contains the main code, which creates the agents and launches them.
Parameters can be changed for the Actor and Critic networks by changing:
policy_net_args = {"num_Hlayers": 2,
"activations_Hlayers": ["relu", "relu"],
"Hlayer_sizes": [100, 100],
"n_output_units": 2,
"output_layer_activation": tf.nn.tanh,
"state_space_size": 8,
"action_space_size": 2,
"Entropy": 0.01,
"action_space_upper_bound": action_space_upper_bound,
"action_space_lower_bound": action_space_lower_bound,
"optimizer": tf.train.RMSPropOptimizer(0.0001),
"total_number_episodes": 5000,
"number_of_episodes_before_update": 1,
"frequency_of_printing_statistics": 100,
"frequency_of_rendering_episode": 1000,
"number_child_agents": 8,
"episodes_back": 20,
"gamma": 0.99,
"regularization_constant": 0.01,
"max_steps_per_episode": 2000
}
valuefunction_net_args = {"num_Hlayers": 2,
"activations_Hlayers": ["relu", "relu"],
"Hlayer_sizes": [100, 64],
"n_output_units": 1,
"output_layer_activation": "linear",
"state_space_size": 8,
"action_space_size": 2,
"optimizer": tf.train.RMSPropOptimizer(0.01),
"regularization_constant": 0.01}