Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] HAPPO, and HetIPPO, HetMAPPO, ... Implementations #1150

Closed
HenningBeyer opened this issue Dec 25, 2024 · 6 comments
Closed

[FEATURE] HAPPO, and HetIPPO, HetMAPPO, ... Implementations #1150

HenningBeyer opened this issue Dec 25, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@HenningBeyer
Copy link

Feature

  • HAPPO seems to be missing in the current implementation and either being WIP or discarded, even though HASAC was implemented in Mava already. So implementing HAPPO + HAPPO_rec would be a relevant feature.

  • Heterogeneous agent versions of any I-type and MA-type Agent would be interesting for having heterogeneous agent counterparts of agents like IPPO and MAPPO, that could tackle tasks that require heterogeneity to solve them properly with the standard algorithms of IPPO/MAPPO.

    • For context Agents like HetIPPO were introduced in the VMAS paper, while a "HetMAPPO" version was used in the MAPPO Paper to allow the use of heterogeneous agents. I spotted TorchRL adopting these heterogeneous agent versions already (see this line), while they still miss agents like HAPPO or HASAC.
    • Heterogeneous agent versions would also enable better comparability between agents like (Het-)MAPPO and the more advanced HAPPO agent, which might be relevant for future research on the MARL agents, and in application.
    • HetMAPPO/HetIPPO means only not to share parameters in MAPPO/IPPO. So all that needs to be done to implement Het- agent versions, is to disable parameter sharing. HAPPO differs from HetMAPPO in its additional sequential updating scheme, while HetMAPPO simply updates all agents at once like MAPPO.

Proposal

  • It seems that to implement HAPPO not much has to be changed from MAPPO apart from adding the sequential HARL updating scheme that (of my knowledge) only needs to calculate updating weights M as in the depiction below. Else, the implementation should be the same as MAPPO.
    image

(Image Source)

I also looked through the Mava code and can't spot any barrier implementing this yet. One should essentially be able to implement this into the actor_loss_fn similarly to HASAC.

This documentation source might be helpful as additional context for HAPPO.

To implement Het- Agent versions, simply provide an option/implementation without parameter sharing for I- and MA-Type MARL agents. The recurrent versions for the Het-type agents might be kept up as well.

Definition of done

Implemented HAPPO, HAPPO_rec, HetIPPO, HetIPPO_rec, HetMAPPO_rec, HetISAC, HetMASAC.

One can do the same with IQL and QMIX, giving HetQMix_rec, HetIQL_rec - I wonder why there are no non-recurrent IQL/QMIX variants; I guess they do not perform that well as recurrent variants, but it could be still interesting to have a classic IQL/QMIX version for testing. [OPTIONAL]

In general, these are a lot of ideas/tasks. Feel free to just implement what seems relevant to you.

@HenningBeyer HenningBeyer added the enhancement New feature or request label Dec 25, 2024
@ch33nchan
Copy link

@HenningBeyer check the pr-#1151 tried out a basic implementation? would love your feedback!

@sash-a
Copy link
Contributor

sash-a commented Dec 27, 2024

Hi @HenningBeyer thanks for the issue. We have tried HAPPO in the past and found that it doesn't perform well, which is why we don't currently have it in Mava. If you'd like to contribute it you're welcome to just know that we'd like to fully benchmark it and see that it performs compatibly to MAPPO/it's paper results before accepting it.

As for heterogenous algorithms we don't currently support this but I think it would be something good to add to the Mava roadmap (cc @RuanJohn). If you're looking to implement this, simply vmaping the network.init function is a good place to start as this will give you a set of parameters per agent

@HenningBeyer
Copy link
Author

Hi @sash-a, thank you for your insights.

The performance of MAPPO and HAPPO mostly matched in the HARL paper and follow-up papers, where HAPPO seems to perform equal or barely better than HetMAPPO/MAPPO in ~75% of the cases. So I think it's some little detail that's missing.

Else, HAPPO outperformed MAPPO for highly heterogeneous + complex tasks as humanoid-v2 17x1, ShadowHandCatchOver2Underarm, or ShadowHandPen. While HAPPO and HetMAPPO achieve very similar performance for the simple homogeneous tasks like MPE in the HARL paper.

So HAPPO might rather be relevant for cases of very complex and highly heterogeneous CTDE MARL tasks, where HASAC also does well. In my case, I looked for a less memory-intensive alternative to HASAC for large-scale MARL simulations, that may also conveniently fit on a single GPU and is more easily transferable to a real-world online-learning setup without the big replay buffer of HASAC. MAPPO/HetMAPPO + the recurrent variants should suffice here, although testing HAPPO would be interesting too in simulation.

PQN (CTCE) or PQN-VDN (CTDE) look interesting for this use case too. PQN-VDN is currently implemented in JaxMARL and seems competitive to MAPPO in performance (based on the paper results).

@sash-a
Copy link
Contributor

sash-a commented Jan 6, 2025

To me it seems mostly worse than MAPPO from the HAPPO paper and the MAT paper. In both it is definitely worse on SMAC, but in the HAPPO paper it is better in the continuous control tasks.

Would you be interested in implementing HAPPO.

For your problem of large-scale MARL I would highly recommend the team's newest work: Sable. It was specifically developed for this and is state-of-the-art for a wide array of benchmarks. There's also an implementation in mava

@HenningBeyer
Copy link
Author

Hi sash-a, we probably go then just with the simpler PPO/IPPO/MAPPO for now in the lab setting as it should get the performance we need. HAPPO might only get a bit more relevant practically for highly heterogeneous tasks, but we do not depend on it that much. It was more an interest to test it, and see how well it performs, and to include it into the experiments for completeness.

In case we still implement it, we would contribute it. However, it does not seem very likely.

Sable seems indeed interesting, thanks. We'll look into it.

@sash-a
Copy link
Contributor

sash-a commented Jan 7, 2025

Great, if you need any other help on this feel free to re-open the issue 😄

@sash-a sash-a closed this as completed Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants