Skip to content

Latest commit

 

History

History
165 lines (115 loc) · 5.01 KB

README.md

File metadata and controls

165 lines (115 loc) · 5.01 KB

Frozen Lake Environment

Aim:

  • To achieve the desired outcome efficiently, the agent needs to find the most effective route using dynamic programming.

Description:

  • This environment is composed of grids in 4x4 and 8x8 sizes. Each grid can either be a frozen lake or a hole, and the objective is to reach the final grid containing a gift.
  • It is a model-based Environment.
  • It is also a Sparse reward Environment.

Deterministic Environment

Stochastic Environment

State Space

  • For a 4x4 grid, each cell or state is represented by an integer from 0 to 15. For an 8x8 grid, the range is from 0 to 63.
  • If an agent takes an action towards the grid boundary, it remains in the same state.

Action Space

In any given state, an agent can take various actions

  Left - 0
  Down - 1
  Right- 2
  Up   - 3

Reward

  • If the agent falls into the hole or lands on a frozen lake, the reward is 0.
  • However, if it reaches the goal state, it receives a reward of 1.

Algorithms

The Dynamic Programming method is utilized to achieve policy convergence. There are two alternative methods to accomplish this task.

Policy Iteration

  • Computing value function for all states.
  • Using the action value function to evaluate policy with greediness.
  • Continue iterating until the policy reaches convergence.

Value Iteration

  • A technique for determining the best value function through iterative updates of the Bellman equation.
  • Taking the best action for a state using the action-value function.
  • Continue the iteration until the policy converges.

Minigrid Environment

Aim:

  • The objective for the agent is to achieve the goal state in the most efficient manner possible.

Description:

  • The Minigrid Environment is an empty room containing one agent and one goal state, with no obstacles.
  • There are two environments available: MiniGrid-Empty-6x6-v0 and MiniGrid-Empty-8x8-v0.
  • The environment is model-free.

State Space

  • Each state in MiniGrid-Empty-6x6-v0 is represented by (x,y) coordinates, where x and y range from 1 to 4 among the 16 states.
  • In the MiniGrid-Empty-8x8-v0 environment, there are 36 states represented by (x,y) coordinates where x and y range from 1 to 6.
  • The state space includes the direction of the agent, which is indicated as follows:
    • 0 - Right
    • 1 - Down
    • 2 - Left
    • 3 - Up
  • The observation includes an image array that can be utilized to locate the agent within the environment.

Action Space

An agent can take three actions to alter its state,

  - 0 - Turn Left
  - 1 - Turn Right
  - 2 - Move Forward

Rewards

  • Success earns a reward of '1 - 0.9 * (step_count / max_steps)' while failure earns '0'.
  • 'max_steps' refers to the maximum number of steps an agent can take in an episode.
  • The 'step_count' records the number of steps taken by the agent during an episode, but it cannot exceed the 'max_steps' limit.

Algorithms

 Monte-Carlo
 SARSA
 SARSA Lambda
 Q-Learning

Results

MiniGrid-Empty-6x6-v0

Graph 1

Graph 2

Flappy Bird Environment

Aim:

  • The agent bird learns to score by crossing pipes with the Q-Learning Algorithm.

Description:

  • The game has a bird as the agent and randomly generated pipes. The bird can only move vertically while the pipes move horizontally. There is also a base and background.
  • The Flappy Bird Environment is a model-free environment.

Flappy bird

Requirements

  • Matplotlib
  • NumPy
  • flappy_bird_gym (Cloned Repository from Flappy-bird-gym)
  pip install Matplotlib
  pip install NumPy

Note: An algorithm Python file was created in the cloned repository folder to import flappy_bird_gym directly into the code.

State Space

  • The environment comprises the bird (agent) center's location as its state.
  • Location shows bird's distance from the next pipe and lower pipe's hole center.
  • The state keeps resetting every time I hit the pipe or crash on the base.
  • Agent moves upward direction only if it flaps since the PLAYER_VEL_ROT = 0 and player_rot = 0 degrees but initially it was 45 degrees.

Action Space

Here are the possible moves that the agent can make at any given state:

  Flap       - 1
  Do nothing - 0

Reward

  • When the bird crosses the pipes, it earns a reward of +5.
  • If the bird collides with the pipe or hits the ground, it will receive a penalty of -10.
  • If the bird survives, it will be rewarded with +1 for each time step.

Algorithm

  • In this environment, the agent is trained using the Q-Learning Algorithm.

Results

Training results

img1

Testing results

img2