Advantage Alignment

Code for the fastest existing algorithm for scalable opponent shaping: https://arxiv.org/abs/2406.14662

Update log: 04/20/2024 by Tianyu

include tournament.yaml and tournament.py. Usage: python tournament.py. The definition of the aa and ppo agent network information is here. All the network-based agent must share the same architecture. The tornament between agents with different architectures is not supported yet.
include agent load, save, eval, train. Check here Usage:

agent.to(cfg.device)
agent.save('agent.pth') # save all the nn.Module state_dict in the agent
agent.load('agent.pth') # load all the nn.Module state_dict in the agent
agent.eval()
agent.train()

implement detach_and_move_to_cpu for the trajecotry class to be able to save to json.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Advantage Alignment

Update log: 04/20/2024 by Tianyu

Files

README.md

Latest commit

History

README.md

File metadata and controls

Advantage Alignment

Update log: 04/20/2024 by Tianyu