Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multihead committees #800

Draft
wants to merge 28 commits into
base: develop
Choose a base branch
from
Draft

Conversation

beckobert
Copy link
Contributor

Hello everyone,

This pull request will allow to use the multihead mechanism to build computationally efficient committees of MACE models, by sharing the large atomic descriptor part of the MLP and only using different output blocks for the individual committee members. This can be used as an uncertainty measure for the MLP.

This PR aims to use and keep as much of the original infrastructure. The code currently works well and the results are promising, but there are at the moment two main problems, where I would be glad for any help and recommendations, and a few items that are still on my to-do list.

Problems

  • In theory, this multihead committee should predict energy at forces at a negligable additional computational costs, but in practice, this is not yet true. While the autodifferentiated graph is retained when computing the forces, MACE still has to repeat the actual calculations of the values for every committee member separately, even though most of it is redundant.
  • I had to change the MLP output layer away from the original masking set-up towards more structures connections between the nodes. This works very well for e3nn but I haven't figured out how to do this with cuEquivariance.

To-Do list

  • Adapt ASE calculator (should be straight forward)
  • Write tests
  • Write Documentation (once code is in it's final form)

Please, let me know you opinions, if you require any explanations and of course, if you have an idea how to tackle the 2 key problems.

@ilyes319
Copy link
Contributor

ilyes319 commented Feb 3, 2025

@beckobert, Hey, thank you for the well structured PR!!
Could you tell me in more details what is happening with cuequivariance? Also can you explain what are these "structured connections" that you changed to?

@beckobert
Copy link
Contributor Author

The changes are in the NonLinearReadoutBlock. In the old version, the input nodes were connected to all hidden nodes and those were connected to all output nodes and during prediction, a mask blocked out all "unwanted" hidden nodes so that only the nodes corresponding to a certain output head contribute to the prediction. However, this does not work for the multihead committee, were I want the model to return the correct predictions for all heads at once.
My change ("structured connections") takes advantage of the instructions keyword in the e3nn layers, that allows to specify which nodes of the layers are connected to each other. So now, from each node in the hidden layer there is only one connection to the correct output node and no need for any masking any more. (This also makes the scaling of the normalization when loading a model and adding more heads for foundation model fine tuning easier, see related changes in that part of the code).
However, I have not found a feature similar to e3nn's instructions in cuEquvariance, so I am still looking for an "elegant" implementation of the NonLinearReadoutBlock when using that module.

@tisabe
Copy link

tisabe commented Feb 5, 2025

Hi, this PR looks very interesting, as I am currently also working with MACE committees to get uncertainty predictions, however it is very slow to train multiple models for multiple iterations (in an active learning scenario).
Unfortunately, I don't think I can help with your issues right now, but I am curious about the application and performance of this method. I assume the committee takes some advantages (other than performance) from the multihead functionality, e.g. use different bootstrapped training/validation splits for the different heads?

@beckobert
Copy link
Contributor Author

beckobert commented Feb 7, 2025

Yes, that is a pretty much the basic idea of the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants