Skip to content

The MARL extension for RLlib. A benchmark for research and industry.

License

Notifications You must be signed in to change notification settings

singh-jayant/MARLlib

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library

GitHub license test Documentation Status GitHub issues PyPI version Open In Colab Organization Organization Awesome

โ— News
March 2023 We are excited to announce that a major update has just been released. For detailed version information, please refer to the version info.
May 2023 Exciting news! MARLlib now supports three more popular tasks: MATE, GoBigger, Overcooked-AI. Give them a try!

Multi-agent Reinforcement Learning Library (MARLlib) is a MARL library that utilizes Ray and one of its toolkits RLlib. It offers a comprehensive platform for developing, training, and testing MARL algorithms across various tasks and environments.

Here's an example of how MARLlib can be used:

from marllib import marl

# prepare env
env = marl.make_env(environment_name="mpe", map_name="simple_spread")

# initialize algorithm with appointed hyper-parameters
mappo = marl.algos.mappo(hyperparam_source='mpe')

# build agent model based on env + algorithms + user preference
model = marl.build_model(env, mappo, {"core_arch": "gru", "encode_layer": "128-256"})

# start training
mappo.fit(env, model, stop={'timesteps_total': 1000000}, share_policy='group')

# ready to control
mappo.render(env, model, share_policy='group', restore_path='path_to_checkpoint')

Why MARLlib?

Here we provide a table for the comparison of MARLlib and existing work.

Library Supported Env Algorithm Parameter Sharing Model
PyMARL 1 cooperative 5 share GRU
PyMARL2 2 cooperative 11 share MLP + GRU
MAPPO Benchmark 4 cooperative 1 share + separate MLP + GRU
MAlib 4 self-play 10 share + group + separate MLP + LSTM
EPyMARL 4 cooperative 9 share + separate GRU
MARLlib 13 no task mode restriction 18 share + group + separate + customizable MLP + CNN + GRU + LSTM
Library Github Stars Documentation Issues Open Activity Last Update
PyMARL GitHub stars โŒ GitHub opened issue GitHub commit-activity GitHub last commit
PyMARL2 GitHub stars โŒ GitHub opened issue GitHub commit-activity GitHub last commit
MAPPO Benchmark GitHub stars โŒ GitHub opened issue GitHub commit-activity GitHub last commit
MAlib GitHub stars Documentation Status GitHub opened issue GitHub commit-activity GitHub last commit
EPyMARL GitHub stars โŒ GitHub opened issue GitHub commit-activity GitHub last commit
MARLlib GitHub stars Documentation Status GitHub opened issue GitHub commit-activity GitHub last commit

key features

๐Ÿ”ฐ MARLlib offers several key features that make it stand out:

  • MARLlib unifies diverse algorithm pipelines with agent-level distributed dataflow, allowing researchers to develop, test, and evaluate MARL algorithms across different tasks and environments.
  • MARLlib supports all task modes, including cooperative, collaborative, competitive, and mixed. This makes it easier for researchers to train and evaluate MARL algorithms across a wide range of tasks.
  • MARLlib provides a new interface that follows the structure of Gym, making it easier for researchers to work with multi-agent environments.
  • MARLlib provides flexible and customizable parameter-sharing strategies, allowing researchers to optimize their algorithms for different tasks and environments.

๐Ÿš€ Using MARLlib, you can take advantage of various benefits, such as:

  • Zero knowledge of MARL: MARLlib provides 18 pre-built algorithms with an intuitive API, allowing researchers to start experimenting with MARL without prior knowledge of the field.
  • Support for all task modes: MARLlib supports almost all multi-agent environments, making it easier for researchers to experiment with different task modes.
  • Customizable model architecture: Researchers can choose their preferred model architecture from the model zoo, or build their own.
  • Customizable policy sharing: MARLlib provides grouping options for policy sharing, or researchers can create their own.
  • Access to over a thousand released experiments: Researchers can access over a thousand released experiments to see how other researchers have used MARLlib.

Installation

Note: Please note that at this time, MARLlib is only compatible with Linux operating systems.

Step-by-step (recommended)

  • install dependencies
  • install environments
  • install patches

1. install dependencies (basic)

First, install MARLlib dependencies to guarantee basic usage. following this guide, finally install patches for RLlib.

$ conda create -n marllib python=3.8 # or 3.9
$ conda activate marllib
$ git clone https://github.com/Replicable-MARL/MARLlib.git && cd MARLlib
$ pip install -r requirements.txt

2. install environments (optional)

Please follow this guide.

3. install patches (basic)

Fix bugs of RLlib using patches by running the following command:

$ cd /Path/To/MARLlib/marl/patch
$ python add_patch.py -y

PyPI

$ pip install --upgrade pip
$ pip install marllib

Getting started

Prepare the configuration

There are four parts of configurations that take charge of the whole training process.

  • scenario: specify the environment/task settings
  • algorithm: choose the hyperparameters of the algorithm
  • model: customize the model architecture
  • ray/rllib: change the basic training settings

Before training, ensure all the parameters are set correctly, especially those you don't want to change.

Note: You can also modify all the pre-set parameters via MARLLib API.*

Register the environment

Ensure all the dependencies are installed for the environment you are running with. Otherwise, please refer to MARLlib documentation.

task mode api example
cooperative marl.make_env(environment_name="mpe", map_name="simple_spread", force_coop=True)
collaborative marl.make_env(environment_name="mpe", map_name="simple_spread")
competitive marl.make_env(environment_name="mpe", map_name="simple_adversary")
mixed marl.make_env(environment_name="mpe", map_name="simple_crypto")

Most of the popular environments in MARL research are supported by MARLlib:

Env Name Learning Mode Observability Action Space Observations
LBF cooperative + collaborative Both Discrete 1D
RWARE cooperative Partial Discrete 1D
MPE cooperative + collaborative + mixed Both Both 1D
SMAC cooperative Partial Discrete 1D
MetaDrive collaborative Partial Continuous 1D
MAgent collaborative + mixed Partial Discrete 2D
Pommerman collaborative + competitive + mixed Both Discrete 2D
MAMuJoCo cooperative Full Continuous 1D
GRF collaborative + mixed Full Discrete 2D
Hanabi cooperative Partial Discrete 1D
MATE cooperative + mixed Partial Both 1D
GoBigger cooperative + mixed Both Continuous 1D
Overcooked-AI cooperative Full Discrete 1D

Each environment has a readme file, standing as the instruction for this task, including env settings, installation, and important notes.

Initialize the algorithm
running target api example
train & finetune marl.algos.mappo(hyperparam_source=$ENV)
develop & debug marl.algos.mappo(hyperparam_source="test")
3rd party env marl.algos.mappo(hyperparam_source="common")

Here is a chart describing the characteristics of each algorithm:

algorithm support task mode discrete action continuous action policy type
IQL* all four โœ”๏ธ off-policy
PG all four โœ”๏ธ โœ”๏ธ on-policy
A2C all four โœ”๏ธ โœ”๏ธ on-policy
DDPG all four โœ”๏ธ off-policy
TRPO all four โœ”๏ธ โœ”๏ธ on-policy
PPO all four โœ”๏ธ โœ”๏ธ on-policy
COMA all four โœ”๏ธ on-policy
MADDPG all four โœ”๏ธ off-policy
MAA2C* all four โœ”๏ธ โœ”๏ธ on-policy
MATRPO* all four โœ”๏ธ โœ”๏ธ on-policy
MAPPO all four โœ”๏ธ โœ”๏ธ on-policy
HATRPO cooperative โœ”๏ธ โœ”๏ธ on-policy
HAPPO cooperative โœ”๏ธ โœ”๏ธ on-policy
VDN cooperative โœ”๏ธ off-policy
QMIX cooperative โœ”๏ธ off-policy
FACMAC cooperative โœ”๏ธ off-policy
VDAC cooperative โœ”๏ธ โœ”๏ธ on-policy
VDPPO* cooperative โœ”๏ธ โœ”๏ธ on-policy

*all four: cooperative collaborative competitive mixed

IQL is the multi-agent version of Q learning. MAA2C and MATRPO are the centralized version of A2C and TRPO. VDPPO is the value decomposition version of PPO.

Build the agent model

An agent model consists of two parts, encoder and core arch. encoder will be constructed by MARLlib according to the observation space. Choose mlp, gru, or lstm as you like to build the complete model.

model arch api example
MLP marl.build_model(env, algo, {"core_arch": "mlp")
GRU marl.build_model(env, algo, {"core_arch": "gru"})
LSTM marl.build_model(env, algo, {"core_arch": "lstm"})
Encoder Arch marl.build_model(env, algo, {"core_arch": "gru", "encode_layer": "128-256"})
Kick off the training
setting api example
train algo.fit(env, model)
debug algo.fit(env, model, local_mode=True)
stop condition algo.fit(env, model, stop={'episode_reward_mean': 2000, 'timesteps_total': 10000000})
policy sharing algo.fit(env, model, share_policy='all') # or 'group' / 'individual'
save model algo.fit(env, model, checkpoint_freq=100, checkpoint_end=True)
GPU accelerate algo.fit(env, model, local_mode=False, num_gpus=1)
CPU accelerate algo.fit(env, model, local_mode=False, num_workers=5)
Training & rendering API
from marllib import marl

# prepare env
env = marl.make_env(environment_name="mpe", map_name="simple_spread")
# initialize algorithm with appointed hyper-parameters
mappo = marl.algos.mappo(hyperparam_source="mpe")
# build agent model based on env + algorithms + user preference
model = marl.build_model(env, mappo, {"core_arch": "mlp", "encode_layer": "128-256"})
# start training
mappo.fit(
  env, model, 
  stop={"timesteps_total": 1000000}, 
  checkpoint_freq=100, 
  share_policy="group"
)
# rendering
mappo.render(
  env, model, 
  local_mode=True, 
  restore_path={'params_path': "checkpoint_000010/params.json",
                'model_path': "checkpoint_000010/checkpoint-10"}
)

Results

All results are listed here.

Quick examples

MARLlib provides some practical examples for you to refer to.

Tutorials

Try MPE + MAPPO examples on Google Colaboratory! Open In Colab

More tutorial documentations are available here.

Awesome List

A collection of research and review papers of multi-agent reinforcement learning (MARL) is available. The papers have been organized based on their publication date and their evaluation of the corresponding environments.

Algorithms: Awesome Environments: Awesome

Community

Channel Link
Issues GitHub Issues

Roadmap

The roadmap to the future release is available in ROADMAP.md.

Contributing

We are a small team on multi-agent reinforcement learning, and we will take all the help we can get! If you would like to get involved, here is information on contribution guidelines and how to test the code locally.

You can contribute in multiple ways, e.g., reporting bugs, writing or translating documentation, reviewing or refactoring code, requesting or implementing new features, etc.

Paper

If you use MARLlib in your research, please cite the MARLlib paper.

@article{hu2022marllib,
  title={MARLlib: Extending RLlib for Multi-agent Reinforcement Learning},
  author={Hu, Siyi and Zhong, Yifan and Gao, Minquan and Wang, Weixun and Dong, Hao and Li, Zhihui and Liang, Xiaodan and Chang, Xiaojun and Yang, Yaodong},
  journal={arXiv preprint arXiv:2210.13708},
  year={2022}
}

About

The MARL extension for RLlib. A benchmark for research and industry.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 85.9%
  • C++ 11.2%
  • Jupyter Notebook 1.8%
  • C 0.8%
  • Shell 0.1%
  • Dockerfile 0.1%
  • CMake 0.1%