The implementation of Reinforcement learning algorithms has made a huge impact on various problems where no existing methodologies has succeeded in control task and make decision. In this paper we are implementing a hybrid algorithm to virtual self-driving car through collating the Actor-Critic and Proximal Policy Optimization (PPO) methods to introduce a continuous control tasks for locomotion of cars. Successful locomotion of a self-driving car can be achieved through angular movements of the steering by understanding the changes in environment where the actions like to take turns smoothly or throttle maps to continuous action space. The policy which maps input received from the sensors which causes change of action in cars is upgraded to achieve rewards. Due to these upgraded techniques the general policy-based methods have been improvised by the Actor-Critic method. The primary purpose of the research is to study the performance of the modified policy optimization techniques which enhances the interaction of the agent with the environment resulting in improved rewards in comparison with other policy-based methods. The testbeds used for the implementation of the modified algorithm are Cartpole and MountainCarContinuous. The modified actor-critic algorithm has yielded consistent policy update reducing the risk of learning a sudden irreversible bad policy.
-
Notifications
You must be signed in to change notification settings - Fork 0
simonsimanta/Reinforcement-learning
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Optimization of Actor Critic Policy in Continuous Action Space
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published