Frozen Lake Environment

Aim:

To achieve the desired outcome efficiently, the agent needs to find the most effective route using dynamic programming.

Description:

This environment is composed of grids in 4x4 and 8x8 sizes. Each grid can either be a frozen lake or a hole, and the objective is to reach the final grid containing a gift.
It is a model-based Environment.
It is also a Sparse reward Environment.

Deterministic Environment

Stochastic Environment

State Space

For a 4x4 grid, each cell or state is represented by an integer from 0 to 15. For an 8x8 grid, the range is from 0 to 63.
If an agent takes an action towards the grid boundary, it remains in the same state.

Action Space

In any given state, an agent can take various actions

  Left - 0
  Down - 1
  Right- 2
  Up   - 3

Reward

If the agent falls into the hole or lands on a frozen lake, the reward is 0.
However, if it reaches the goal state, it receives a reward of 1.

Algorithms

The Dynamic Programming method is utilized to achieve policy convergence. There are two alternative methods to accomplish this task.

Policy Iteration

Computing value function for all states.
Using the action value function to evaluate policy with greediness.
Continue iterating until the policy reaches convergence.

Value Iteration

A technique for determining the best value function through iterative updates of the Bellman equation.
Taking the best action for a state using the action-value function.
Continue the iteration until the policy converges.

Minigrid Environment

Aim:

The objective for the agent is to achieve the goal state in the most efficient manner possible.

Description:

The Minigrid Environment is an empty room containing one agent and one goal state, with no obstacles.
There are two environments available: MiniGrid-Empty-6x6-v0 and MiniGrid-Empty-8x8-v0.
The environment is model-free.

State Space

Each state in MiniGrid-Empty-6x6-v0 is represented by (x,y) coordinates, where x and y range from 1 to 4 among the 16 states.
In the MiniGrid-Empty-8x8-v0 environment, there are 36 states represented by (x,y) coordinates where x and y range from 1 to 6.
The state space includes the direction of the agent, which is indicated as follows:
- 0 - Right
- 1 - Down
- 2 - Left
- 3 - Up
The observation includes an image array that can be utilized to locate the agent within the environment.

Action Space

An agent can take three actions to alter its state,

  - 0 - Turn Left
  - 1 - Turn Right
  - 2 - Move Forward

Rewards

Success earns a reward of '1 - 0.9 * (step_count / max_steps)' while failure earns '0'.
'max_steps' refers to the maximum number of steps an agent can take in an episode.
The 'step_count' records the number of steps taken by the agent during an episode, but it cannot exceed the 'max_steps' limit.

Algorithms

 Monte-Carlo
 SARSA
 SARSA Lambda
 Q-Learning

Results

MiniGrid-Empty-6x6-v0

Flappy Bird Environment

Aim:

The agent bird learns to score by crossing pipes with the Q-Learning Algorithm.

Description:

The game has a bird as the agent and randomly generated pipes. The bird can only move vertically while the pipes move horizontally. There is also a base and background.
The Flappy Bird Environment is a model-free environment.

Requirements

Matplotlib
NumPy
flappy_bird_gym (Cloned Repository from Flappy-bird-gym)

  pip install Matplotlib
  pip install NumPy

Note: An algorithm Python file was created in the cloned repository folder to import flappy_bird_gym directly into the code.

State Space

The environment comprises the bird (agent) center's location as its state.
Location shows bird's distance from the next pipe and lower pipe's hole center.
The state keeps resetting every time I hit the pipe or crash on the base.
Agent moves upward direction only if it flaps since the PLAYER_VEL_ROT = 0 and player_rot = 0 degrees but initially it was 45 degrees.

Action Space

Here are the possible moves that the agent can make at any given state:

  Flap       - 1
  Do nothing - 0

Reward

When the bird crosses the pipes, it earns a reward of +5.
If the bird collides with the pipe or hits the ground, it will receive a penalty of -10.
If the bird survives, it will be rewarded with +1 for each time step.

Algorithm

In this environment, the agent is trained using the Q-Learning Algorithm.

Results

Training results

Testing results

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Frozen Lake Environment

Deterministic Environment

Stochastic Environment

State Space

Action Space

Reward

Algorithms

Policy Iteration

Value Iteration

Minigrid Environment

State Space

Action Space

Rewards

Algorithms

Results

MiniGrid-Empty-6x6-v0

Flappy Bird Environment

Requirements

State Space

Action Space

Reward

Algorithm

Results

Files

README.md

Latest commit

History

README.md

File metadata and controls

Frozen Lake Environment

Deterministic Environment

Stochastic Environment

State Space

Action Space

Reward

Algorithms

Policy Iteration

Value Iteration

Minigrid Environment

State Space

Action Space

Rewards

Algorithms

Results

MiniGrid-Empty-6x6-v0

Flappy Bird Environment

Requirements

State Space

Action Space

Reward

Algorithm

Results