: thoroughly-tested. In many cases, we verified against known values and/or reproduced results from papers.
~: implemented but lightly tested.
X: known problems; please see github issues.
Algorithms | Category | Reference | Status |
---|---|---|---|
Information Set Monte Carlo Tree Search (IS-MCTS) | Search | Cowley et al. '12 | ~ |
Minimax (and Alpha-Beta) Search | Search | Wikipedia1, Wikipedia2, Knuth and Moore '75 | |
Monte Carlo Tree Search | Search | Wikipedia, UCT paper, Coulom '06, Cowling et al. survey | |
Lemke-Howson (via nashpy) | Opt. | Wikipedia, Shoham & Leyton-Brown '09 | |
ADIDAS | Opt. | Gemp et al '22 | ~ |
Sequence-form linear programming | Opt. | Koller, Megiddo, and von Stengel '94, Shoham & Leyton-Brown '09 |
|
Stackelberg equilibrium solver | Opt. | Conitzer & Sandholm '06 | ~ |
MIP-Nash | Opt. | Sandholm et al. '05 | ~ |
Magnetic Mirror Descent (MMD) with dilated entropy | Opt. | Sokota et al. '22 | ~ |
Counterfactual Regret Minimization (CFR) | Tabular | Zinkevich et al '08, Neller & Lanctot '13 | |
CFR against a best responder (CFR-BR) | Tabular | Johanson et al '12 | |
Exploitability / Best response | Tabular | Shoham & Leyton-Brown '09 | |
External sampling Monte Carlo CFR | Tabular | Lanctot et al. '09, Lanctot '13 | |
Fixed Strategy Iteration CFR (FSICFR) | Tabular | Neller & Hnath '11 | ~ |
Mean-field Ficticious Play for MFG | Tabular | Perrin et. al. '20 | ~ |
Online Mirror Descent for MFG | Tabular | Perolat et. al. '21 | ~ |
Munchausen Online Mirror Descent for MFG | Tabular | Lauriere et. al. '22 | ~ |
Fixed Point for MFG | Tabular | Huang et. al. '06 | ~ |
Boltzmann Policy Iteration for MFG | Tabular | Lauriere et. al. '22 | ~ |
Outcome sampling Monte Carlo CFR | Tabular | Lanctot et al. '09, Lanctot '13 | |
Policy Iteration | Tabular | Sutton & Barto '18 | |
Q-learning | Tabular | Sutton & Barto '18 | |
Regret Matching | Tabular | Hart & Mas-Colell '00 | |
Restricted Nash Response (RNR) | Tabular | Johanson et al '08 | ~ |
SARSA | Tabular | Sutton & Barto '18 | |
Value Iteration | Tabular | Sutton & Barto '18 | |
Advantage Actor-Critic (A2C) | RL | Mnih et al. '16 | |
Deep Q-networks (DQN) | RL | Mnih et al. '15 | |
Ephemeral Value Adjustments (EVA) | RL | Hansen et al. '18 | ~ |
Proximal Policy Optimization (PPO) | RL | Schulman et al. '18 | ~ |
AlphaZero (C++/LibTorch) | MARL | Silver et al. '18 | |
AlphaZero (Python/TF) | MARL | Silver et al. '18 | |
Correlated Q-Learning | MARL | Greenwald & Hall '03 | ~ |
Asymmetric Q-Learning | MARL | Kononen '04 | ~ |
Deep CFR | MARL | Brown et al. '18 | |
DiCE: The Infinitely Differentiable Monte-Carlo Estimator (LOLA-DiCE) | MARL | Foerster, Farquhar, Al-Shedivat et al. '18 | ~ |
Exploitability Descent (ED) | MARL | Lockhart et al. '19 | |
(Extensive-form) Fictitious Play (XFP) | MARL | Heinrich, Lanctot, & Silver '15 | |
Learning with Opponent-Learning Awareness (LOLA) | MARL | Foerster, Chen, Al-Shedivat, et al. '18 | ~ |
Nash Q-Learning | MARL | Hu & Wellman '03 | ~ |
Neural Fictitious Self-Play (NFSP) | MARL | Heinrich & Silver '16 | |
Neural Replicator Dynamics (NeuRD) | MARL | Omidshafiei, Hennes, Morrill, et al. '19 | X |
Regret Policy Gradients (RPG, RMPG) | MARL | Srinivasan, Lanctot, et al. '18 | |
Policy-Space Response Oracles (PSRO) | MARL | Lanctot et al. '17 | |
Q-based ("all-actions") Policy Gradient (QPG) | MARL | Srinivasan, Lanctot, et al. '18 | |
Regularized Nash Dynamics (R-NaD) | MARL | Perolat, De Vylder, et al. '22 | |
Regression CFR (RCFR) | MARL | Waugh et al. '15, Morrill '16 | |
Rectified Nash Response (PSRO_rn) | MARL | Balduzzi et al. '19 | ~ |
Win-or-Learn-Fast Policy-Hill Climbing (WoLF-PHC) | MARL | Bowling & Veloso '02 | ~ |
α-Rank | Eval. / Viz. | Omidhsafiei et al. '19, arXiv | |
Nash Averaging | Eval. / Viz. | Balduzzi et al. '18 | ~ |
Replicator / Evolutionary Dynamics | Eval. / Viz. | Hofbaeur & Sigmund '98, Sandholm '10 |