Introduction Artificial Intelligence (AI) has long captivated our imaginations through the lens of science fiction. However, recent years have witnessed substantial strides in the field of AI, bringing us closer to transforming this once-fictional concept into a tangible reality. Notably, AI has demonstrated remarkable proficiency in tasks previously considered the exclusive domain of human intelligence. One pivotal capability that has emerged is the ability to learn from environments and make decisions—a concept encapsulated by Reinforcement Learning (RL). Reinforcement Learning represents a category of machine learning wherein an agent learns to navigate an environment by executing actions and assessing the subsequent rewards or consequences. The ultimate objective for the agent is to derive an optimal policy, a function that dictates the most judicious actions based on the prevailing state of the environment. In this project, our focus is on crafting an Intelligent Agent adept at tackling the CartPole problem using Reinforcement Learning within the OpenAI Gym environment. OpenAI Gym provides an accessible and user-friendly interface, enabling the design and comparison of various reinforcement learning algorithms. Problem Statement The CartPole problem stands as a quintessential challenge within the realm of Reinforcement Learning. It unfolds with a cart traversing a frictionless track, carrying a pole affixed to its structure. The initial position sees the pole upright, and the primary objective is to sustain this equilibrium by strategically adjusting the cart's velocity. A reward of +1 is granted for each timestep that the pole remains upright. The episode concludes under two conditions: when the pole deviates beyond 12 degrees from vertical or when the cart strays more than 2.4 units from the center of the environment.
Problem Characteristics Percepts The agent engages with four percepts within this environment: Cart Position: The x-coordinate of the cart on the track. Cart Velocity: The velocity of the cart along the track. Pole Angle: The angle of the pole with respect to the vertical. Pole Velocity At Tip: The rate of change of the pole's angle.
Moves The agent possesses two possible moves or actions: Push cart to the left. Push cart to the right.
Actions The agent's actions have a direct impact on the environment, manifesting through the following choices: Do nothing: The agent refrains from applying any force to the cart. Apply force to the left: The agent exerts a force of -1 on the cart. Apply force to the right: The agent exerts a force of +1 on the cart. Apply random force: The agent introduces a random force to the cart. Apply optimal force: The agent applies the learned optimal force derived from the reinforcement learning process.
Agent's Goal The overarching aim for the agent is to deduce the optimal policy, maximizing the cumulative reward—measured in the number of timesteps the pole maintains its upright position. The implementation of this project adheres to an object-oriented approach, utilizing both the table-driven method and intelligent techniques, specifically Reinforcement Learning. The project outcome will explicitly showcase the agent's performance results. Project Significance This endeavor seeks to underscore the prowess and potential of Reinforcement Learning in addressing intricate control problems. Despite its deceptively simple appearance, the CartPole problem necessitates a delicate balance of exploration and exploitation, rendering it an ideal canvas for illustrating the capabilities inherent in Reinforcement Learning.