Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Independent Q-learning for BPD and Congestion Game #50

Closed
wants to merge 48 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
6353018
initial version of the centralised agent wrapper for parallel environ…
rradules Dec 6, 2023
080d045
CA wrapper
rradules Dec 8, 2023
b93f718
check for central observation flag and method
rradules Dec 11, 2023
945c2b0
check for central observation flag and method, in step
rradules Dec 11, 2023
ec672b3
sum reward over agents
rradules Dec 12, 2023
35b0934
Merge branch 'main' into agent-wrappers
rradules Dec 12, 2023
ff24a78
BaseParallelWrapper extension for CA wrapper
rradules Dec 14, 2023
da3789d
fix reward in item gathering and define state function for global state
rradules Dec 18, 2023
963fe76
update centralised agent wrapper to use env.state()
rradules Dec 18, 2023
c877d9a
example script for the centralised agent wrapper
rradules Dec 18, 2023
98082d7
fix docs
rradules Dec 18, 2023
1d43c99
draft for adding MO baseline script example
rradules Dec 18, 2023
06cc402
Merge branch 'main' into agent-wrappers
rradules Dec 27, 2023
057ba5e
random single agent example for wrappers
rradules Dec 27, 2023
bd352cb
draft train GPI PD on item gathering
rradules Dec 27, 2023
d603c2c
agent actions mapping to Discrete space for MORL baselines for item g…
rradules Jan 4, 2024
716d30c
Merge branch 'main' into agent-wrappers
rradules Jan 4, 2024
79be6b9
connect4/breakthrough/samegame docstrings #36
umutucak Jan 24, 2024
ef05f5b
Merge branch 'main' into doc/missing-docstrings
umutucak Jan 24, 2024
820ce79
class dicstring for beach problem domain
rradules Jan 25, 2024
a54dd63
docstring for itemgathering
rradules Jan 28, 2024
298efb9
Merge branch 'main' into agent-wrappers
rradules Jan 29, 2024
9b7f20d
Merge branch 'doc/missing-docstrings' into agent-wrappers
rradules Jan 29, 2024
bf95ed8
remove wandb log folders
rradules Feb 14, 2024
f09d40b
GPI LS on IG script
rradules Feb 14, 2024
3c1c9d1
remove functools
rradules Feb 14, 2024
77d60ff
fix central observation
rradules Feb 15, 2024
4b9e8e4
overwrite unwrappers
rradules Feb 15, 2024
28a0cb1
update gpi params
rradules Feb 19, 2024
7dc0b92
add seed to train script
rradules Feb 20, 2024
9beeeb1
increase collision resolution max limit
rradules Feb 20, 2024
9c7ff68
add no obj param for experiments
rradules Feb 23, 2024
1d3b2c4
enable logging
rradules Feb 23, 2024
21753ca
ref point adaptable
rradules Feb 23, 2024
b2f186b
GPI ls in morl folder
rradules Mar 7, 2024
e2cd561
Add uniform weight gen
ffelten Mar 12, 2024
7a00167
Modify exp name with weight gen
ffelten Mar 12, 2024
5c04ac0
Typo
ffelten Mar 12, 2024
e66d775
Add unifrom weights into discrete momappo
ffelten Mar 12, 2024
8d46452
Same import
ffelten Mar 12, 2024
7c3c1fa
Merge pull request #48 from Farama-Foundation/feature/add-uniform-wei…
ffelten Mar 12, 2024
805cd07
Add momultiwalker PF gif
ffelten Mar 14, 2024
8e2f1fb
Add website links to README.
ffelten Mar 14, 2024
9876f5c
Merge branch 'main' into agent-wrappers
rradules Mar 15, 2024
0e0a64f
utils alignment with main
rradules Mar 15, 2024
d0bbb3a
Merge branch 'main' into agent-wrappers
rradules Mar 15, 2024
3d1b76c
utils alignment with main
rradules Mar 15, 2024
59484d0
Merge pull request #49 from Farama-Foundation/agent-wrappers
rradules Mar 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ __pycache__/

# Pycharm
/.idea
# Cluster scripts
/hpc
momaland/learning/wandb/
momaland/learning/weights/

# Distribution / packaging
.Python
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@
<!-- start elevator-pitch -->
MOMAland is an open source Python library for developing and comparing multi-objective multi-agent reinforcement learning algorithms by providing a standard API to communicate between learning algorithms and environments, as well as a standard set of environments compliant with that API. Essentially, the environments follow the standard [PettingZoo APIs](https://github.com/Farama-Foundation/PettingZoo), but return vectorized rewards as numpy arrays instead of scalar values.

The documentation website is at TODO, and we have a public discord server (which we also use to coordinate development work) that you can join [here](https://discord.gg/bnJ6kubTg6).
The documentation website is at https://momaland.farama.org/, and we have a public discord server (which we also use to coordinate development work) that you can join [here](https://discord.gg/bnJ6kubTg6).
<!-- end elevator-pitch -->

## Environments
MOMAland includes environments taken from the MOMARL literature, as well as multi-objective version of classical environments, such as SISL or Butterfly.
The full list of environments is available at TODO.
The full list of environments is available at https://momaland.farama.org/environments/all-envs/.

## Installation
<!-- start install -->
Expand Down
Binary file added docs/_static/walkers_pf.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 2 additions & 6 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,13 +53,9 @@ Contribute to the Docs <https://github.com/rradules/momaland/tree/master/docs/>
An API standard for multi-objective multi-agent reinforcement learning (MOMARL)
```

<!-- ```{figure} _static/img/environments-demo.gif TODO
:width: 480px
:name: MOMAland Environments
``` -->
```{figure} _static/gifs/surround.gif
```{figure} _static/walkers_pf.gif
:width: 480
:alt: Multiple agents collaborating to surround a target
:alt: Multiple agents collaborating for various trade-offs
```


Expand Down
55 changes: 47 additions & 8 deletions momaland/envs/beach/beach.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,18 +53,57 @@ def raw_env(**kwargs):


class MOBeachDomain(MOParallelEnv):
"""Environment for MO-BeachDomain problem.

The init method takes in environment arguments and should define the following attributes:
- possible_agents
- action_spaces
- observation_spaces
These attributes should not be changed after initialization.
"""A `Parallel` 2-objective environment of the Beach problem domain.

## Observation Space
The observation space is a continuous box with the length `5` containing:
- agent type
- section id (where the agent is)
- section capacity
- section consumption
- percentage of agents of the agent's type in the section

Example:
`[a_type, section_id, section_capacity, section_consumption, %_of_a_of_current_type]`

## Action Space
The action space is a Discrete space, where:
- moving left is -1
- moving right is +1
- staying is 0

## Reward Space
The reward space is a 2D vector containing rewards for two different schemes ('local' or 'global') for:
- the occupation level
- the mixture level
If the scheme is 'local', the reward is given for the currently occupied section.
If the scheme is 'global', the reward is summed over all sections.

## Starting State
The initial position is a uniform random distribution of agents over the sections. This can be changed via the
'position_distribution' argument. The agent types are also randomly distributed according to the
'type_distribution' argument. The default is a uniform distribution over all types.

## Episode Termination
The episode is terminated if num_timesteps is reached. The default value is 100.
Agents only receive the reward after the last timestep.

## Episode Truncation
The problem is not truncated. It has a maximum number of timesteps.

## Arguments
- 'num_timesteps (int)': number of timesteps in the domain. Default: 100
- 'num_agents (int)': number of agents in the domain. Default: 100
- 'reward_scheme (str)': the reward scheme to use ('local', or 'global'). Default: local
- 'sections (int)': number of beach sections in the domain. Default: 6
- 'capacity (int)': capacity of each beach section. Default: 10
- 'type_distribution (tuple)': the distribution of agent types in the domain. Default: 2 types equally distributed (0.5, 0.5).
- 'position_distribution (tuple)': the initial distribution of agents in the domain. Default: uniform over all sections (None).
- 'render_mode (str)': render mode. Default: None
"""

metadata = {"render_modes": ["human"], "name": "mobeach_v0"}

# TODO does this environment require max_cycle?
def __init__(
self,
num_timesteps=10,
Expand Down
99 changes: 46 additions & 53 deletions momaland/envs/breakthrough/breakthrough.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,58 +10,6 @@
| Observation Shape | (board_height=8, board_width=8, 2) |
| Observation Values | [0,1] |
| Reward Shape | (num_objectives=4,) |

MO-Breakthrough is a multi-objective variant of the two-player, single-objective turn-based board game Breakthrough.
In Breakthrough, players start with two rows of identical pieces in front of them, on an 8x8 board, and try to reach
the opponent's home row with any piece. The first player to move a piece on their opponent's home row wins. Players
move alternatingly, and each piece can move one square straight forward or diagonally forward. Opponent pieces can also
be captured, but only by moving diagonally forward, not straight.
MO-Breakthrough extends this game with up to three additional objectives: a second objective that incentivizes faster
wins, a third one for capturing opponent pieces, and a fourth one for avoiding the capture of the agent's own pieces.
Additionally, the board width can be modified from 3 to 20 squares, and the board height from 5 to 20 squares.


### Observation Space

The observation is a dictionary which contains an `'observation'` element which is the usual RL observation described
below, and an `'action_mask'` which holds the legal moves, described in the Legal Actions Mask section below.

The main observation space is 2 planes of a board_height * board_width grid (a board_height * board_width * 2 tensor).
Each plane represents a specific agent's pieces, and each location in the grid represents the placement of the
corresponding agent's piece. 1 indicates that the agent has a piece placed in the given location, and 0 indicates they
do not have a piece in that location (meaning that either the cell is empty, or the other agent has a piece in that
location).


#### Legal Actions Mask

The legal moves available to the current agent are found in the `action_mask` element of the dictionary observation.
The `action_mask` is a binary vector where each index of the vector represents whether the represented action is legal
or not; the action encoding is described in the Action Space section below.
The `action_mask` will be all zeros for any agent except the one whose turn it is. Taking an illegal action ends the
game with a reward of -1 for the illegally moving agent and a reward of 0 for all other agents. #TODO this isn't happening anymore because of missing TerminateIllegalWrapper


### Action Space

The action space is the set of integers from 0 to board_width*board_height*3 (exclusive). If a piece at coordinates
(x,y) is moved, this is encoded as the integer x * 3 * board_height + y * 3 + z where z == 0 for left diagonal, 1 for
straight, and 2 for right diagonal move.


### Rewards

Dimension 0: If an agent moves one of their pieces to the opponent's home row, they will be rewarded 1 point. At the
same time, the opponent agent will be awarded -1 point. There are no draws in Breakthrough.
Dimension 1: If an agent wins, they get a reward of 1-(move_count/max_moves) to incentivize faster wins. The losing
opponent gets the negated reward. In case of a draw, both agents get 0.
Dimension 2: (optional) The number of opponent pieces (divided by the max number of pieces) an agent has captured.
Dimension 3: (optional) The negative number of pieces (divided by the max number of pieces)
an agent has lost to the opponent.


### Version History

"""
from __future__ import annotations

Expand Down Expand Up @@ -107,7 +55,52 @@ def raw_env(**kwargs):


class MOBreakthrough(MOAECEnv):
"""Multi-objective Breakthrough."""
"""Multi-objective Breakthrough.

MO-Breakthrough is a multi-objective variant of the two-player, single-objective turn-based board game Breakthrough.
In Breakthrough, players start with two rows of identical pieces in front of them, on an 8x8 board, and try to reach
the opponent's home row with any piece. The first player to move a piece on their opponent's home row wins. Players
move alternatingly, and each piece can move one square straight forward or diagonally forward. Opponent pieces can also
be captured, but only by moving diagonally forward, not straight.
MO-Breakthrough extends this game with up to three additional objectives: a second objective that incentivizes faster
wins, a third one for capturing opponent pieces, and a fourth one for avoiding the capture of the agent's own pieces.
Additionally, the board width can be modified from 3 to 20 squares, and the board height from 5 to 20 squares.

## Observation Space
The observation is a dictionary which contains an `'observation'` element which is the usual RL observation described
below, and an `'action_mask'` which holds the legal moves, described in the Legal Actions Mask section below.

The main observation space is 2 planes of a board_height * board_width grid (a board_height * board_width * 2 tensor).
Each plane represents a specific agent's pieces, and each location in the grid represents the placement of the
corresponding agent's piece. 1 indicates that the agent has a piece placed in the given location, and 0 indicates they
do not have a piece in that location (meaning that either the cell is empty, or the other agent has a piece in that
location).


### Legal Actions Mask
The legal moves available to the current agent are found in the `action_mask` element of the dictionary observation.
The `action_mask` is a binary vector where each index of the vector represents whether the represented action is legal
or not; the action encoding is described in the Action Space section below.
The `action_mask` will be all zeros for any agent except the one whose turn it is. Taking an illegal action ends the
game with a reward of -1 for the illegally moving agent and a reward of 0 for all other agents. #TODO this isn't happening anymore because of missing TerminateIllegalWrapper

## Action Space
The action space is the set of integers from 0 to board_width*board_height*3 (exclusive). If a piece at coordinates
(x,y) is moved, this is encoded as the integer x * 3 * board_height + y * 3 + z where z == 0 for left diagonal, 1 for
straight, and 2 for right diagonal move.


## Rewards
Dimension 0: If an agent moves one of their pieces to the opponent's home row, they will be rewarded 1 point. At the
same time, the opponent agent will be awarded -1 point. There are no draws in Breakthrough.
Dimension 1: If an agent wins, they get a reward of 1-(move_count/max_moves) to incentivize faster wins. The losing
opponent gets the negated reward. In case of a draw, both agents get 0.
Dimension 2: (optional) The number of opponent pieces (divided by the max number of pieces) an agent has captured.
Dimension 3: (optional) The negative number of pieces (divided by the max number of pieces)
an agent has lost to the opponent.

## Version History
"""

metadata = {
"render_modes": ["ansi"],
Expand Down
96 changes: 44 additions & 52 deletions momaland/envs/connect4/connect4.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,57 +10,6 @@
| Observation Shape | (board_height=6, board_width=7, 2) |
| Observation Values | [0,1] |
| Reward Shape | (2,) or (2+board_width,) |

MO-Connect4 is a multi-objective variant of the two-player, single-objective turn-based board game Connect 4.
In Connect 4, players can win by connecting four of their tokens vertically, horizontally or diagonally. The players
drop their respective token in a column of a standing board (of width 7 and height 6 by default), where each token will
fall until it reaches the bottom of the column or lands on top of an existing token.
Players cannot place a token in a full column, and the game ends when either a player has made a sequence of 4 tokens,
or when all columns have been filled (draw).
MO-Connect4 extends this game with a second objective that incentivizes faster wins, and optionally the additional
(conflicting) objectives of having more tokens than the opponent in every column. Additionally, width and height of the
board can be set to values from 4 to 20.


### Observation Space

The observation is a dictionary which contains an `'observation'` element which is the usual RL observation described
below, and an `'action_mask'` which holds the legal moves, described in the Legal Actions Mask section below.

The main observation space is 2 planes of a board_height * board_width grid (a board_height * board_width * 2 tensor).
Each plane represents a specific agent's tokens, and each location in the grid represents the placement of the
corresponding agent's token. 1 indicates that the agent has a token placed in the given location, and 0 indicates they
do not have a token in that location (meaning that either the cell is empty, or the other agent has a token in that
location).


#### Legal Actions Mask

The legal moves available to the current agent are found in the `action_mask` element of the dictionary observation.
The `action_mask` is a binary vector where each index of the vector represents whether the represented action is legal
or not; the action encoding is described in the Action Space section below.
The `action_mask` will be all zeros for any agent except the one whose turn it is. Taking an illegal action ends the
game with a reward of -1 for the illegally moving agent and a reward of 0 for all other agents. #TODO this isn't happening anymore because of missing TerminateIllegalWrapper


### Action Space

The action space is the set of integers from 0 to board_width (exclusive), where the number represents which column
a token should be dropped in.


### Rewards

Dimension 0: If an agent successfully connects four of their tokens, they will be rewarded 1 point. At the same time,
the opponent agent will be awarded -1 point. If the game ends in a draw, both players are rewarded 0.
Dimension 1: If an agent wins, they get a reward of 1-(move_count/board_size) to incentivize faster wins. The losing opponent gets the negated reward. In case of a draw, both agents get 0.
Dimension 2 to board_width+1 (default 8): (optional) If at game end, an agent has more tokens than their opponent in
column X, they will be rewarded 1 point in reward dimension 2+X. The opponent agent will be rewarded -1 point. If the
column has an equal number of tokens from both players, both players are rewarded 0.


### Version History

"""
from __future__ import annotations

Expand Down Expand Up @@ -120,7 +69,50 @@ def raw_env(**kwargs):


class MOConnect4(MOAECEnv, EzPickle):
"""Multi-objective Connect Four."""
"""Multi-objective Connect Four.

MO-Connect4 is a multi-objective variant of the two-player, single-objective turn-based board game Connect 4.
In Connect 4, players can win by connecting four of their tokens vertically, horizontally or diagonally. The players
drop their respective token in a column of a standing board (of width 7 and height 6 by default), where each token will
fall until it reaches the bottom of the column or lands on top of an existing token.
Players cannot place a token in a full column, and the game ends when either a player has made a sequence of 4 tokens,
or when all columns have been filled (draw).
MO-Connect4 extends this game with a second objective that incentivizes faster wins, and optionally the additional
(conflicting) objectives of having more tokens than the opponent in every column. Additionally, width and height of the
board can be set to values from 4 to 20.

## Observation Space
The observation is a dictionary which contains an `'observation'` element which is the usual RL observation described
below, and an `'action_mask'` which holds the legal moves, described in the Legal Actions Mask section below.

The main observation space is 2 planes of a board_height * board_width grid (a board_height * board_width * 2 tensor).
Each plane represents a specific agent's tokens, and each location in the grid represents the placement of the
corresponding agent's token. 1 indicates that the agent has a token placed in the given location, and 0 indicates they
do not have a token in that location (meaning that either the cell is empty, or the other agent has a token in that
location).

## Legal Actions Mask
The legal moves available to the current agent are found in the `action_mask` element of the dictionary observation.
The `action_mask` is a binary vector where each index of the vector represents whether the represented action is legal
or not; the action encoding is described in the Action Space section below.
The `action_mask` will be all zeros for any agent except the one whose turn it is. Taking an illegal action ends the
game with a reward of -1 for the illegally moving agent and a reward of 0 for all other agents. #TODO this isn't happening anymore because of missing TerminateIllegalWrapper


## Action Space
The action space is the set of integers from 0 to board_width (exclusive), where the number represents which column
a token should be dropped in.

## Rewards
Dimension 0: If an agent successfully connects four of their tokens, they will be rewarded 1 point. At the same time,
the opponent agent will be awarded -1 point. If the game ends in a draw, both players are rewarded 0.
Dimension 1: If an agent wins, they get a reward of 1-(move_count/board_size) to incentivize faster wins. The losing opponent gets the negated reward. In case of a draw, both agents get 0.
Dimension 2 to board_width+1 (default 8): (optional) If at game end, an agent has more tokens than their opponent in
column X, they will be rewarded 1 point in reward dimension 2+X. The opponent agent will be rewarded -1 point. If the
column has an equal number of tokens from both players, both players are rewarded 0.

## Version History
"""

metadata = {
"render_modes": ["human", "rgb_array"],
Expand Down
Loading
Loading