-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for NequIP models #60
base: main
Are you sure you want to change the base?
Conversation
* create openmmml/models/nequippotential.py * create example for toluene example/run_nequip.py
* implement PBC * cleanup nequip types * user specified unit conversions * should work on GPU and CPU * add example for toy model with PBC
* uses nequip 0.6.0 @develop branch * uses torch-nl compute_neighborlist * sets torch dtype from the loaded model metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be a really nice feature to have. I did a first pass through the code and made some comments.
Reading through this, it occurs to me that we really need proper documentation. The README gives a brief overview, but as we expand to more options than just ANI, and especially as we add models that are more complex to use and require installing other packages, that won't be enough.
openmmml/models/nequippotential.py
Outdated
return NequIPPotentialImpl(name, model_path, distance_to_nm, energy_to_kJ_per_mol, atom_types) | ||
|
||
class NequIPPotentialImpl(MLPotentialImpl): | ||
"""This is the MLPotentialImpl implementing the NequIP potential. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should clarify what this means, since there isn't really such a thing as "the NequIP potential". NequIP is a code, not a potential. Presumably this class can be used for any model implemented with that code, including Allegro models for example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class can be used with any model generated by NequIP. I'm currently verifying if this also applies to Allegro, though I anticipate it should.
openmmml/models/nequippotential.py
Outdated
input_dict["pos"] = positions | ||
|
||
# compute edges | ||
mapping, shifts_idx = simple_nl(positions, input_dict["cell"], pbc, self.r_max) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to use the NNPOps neighbor list? In mir-group/nequip#288 (comment) you found it was much faster. NNPOps is already a dependency of this module.
openmmml/models/nequippotential.py
Outdated
|
||
""" | ||
|
||
def __init__(self, name, model_path, distance_to_nm, energy_to_kJ_per_mol, atom_types): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does NequIP have default units it uses, or are the units arbitrary? If it has a preferred set of units, we can put the conversion factors here as default values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the discussion at mir-group/nequip#288 (comment), it appears that NequIP/Allegro is entirely agnostic to units and preserves those of the training dataset. I think it's better for users to receive a TypeError
indicating missing arguments rather than potentially proceeding with incorrect conversions.
openmmml/models/nequippotential.py
Outdated
self.register_buffer('nm_to_distance', torch.tensor(1.0/distance_to_nm)) | ||
self.register_buffer('distance_to_nm', torch.tensor(distance_to_nm)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why register two redundant buffers with the same information?
openmmml/models/nequippotential.py
Outdated
model_path: str | ||
path to deployed NequIP model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also allow the user to directly pass in the model as a PyTorch object?
Can we train a Nequip model on SPICE and enable that to be usable through openmm-ml? |
Hello, Has there been any further progress on this? I have used NequIP in LAMMPS but would like to instead use OpenMM because it is more compatible with the enhanced sampling packages that I use. I have tried running simulations with a NequIP potential with openmm-ml in its current state, however the speed is significantly slower than in LAMMPS. Both simulations are run on a single GPU, however in LAMMPS I also use 32 cpu threads and kokkos. I am not sure if I am doing something incorrect in running openmm-ml, but currently it is unusable for my rather simple system of 645 atoms. Is it expected for it to be slow on a system of this size in its current state? I can provide further information if needed. Thank you so much in advance! Best, |
@svarner9, could you try the current implementation available here? It uses the NNPOps neighbor list, so I anticipate it might be slightly faster for a system of the size you're working with. You can create the potential = MLPotential('nequip', modelPath='model.pth', lengthScale=0.1, energyScale=4.184) What speed-up did you observe in your LAMMPS simulations compared to OpenMM/OpenMM-ML? |
This is done from my side. If someone could take a look and review the changes, that would be great. Performance benchmarks on test models can be found here. Many thanks! |
openmmml/models/nequippotential.py
Outdated
``atomTypes`` parameter. This must be a list containing an integer specifying | ||
the atom type of each particle in the system. Note that by default the model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the code, I think this description is incorrect. It actually should contain the atom type for each particle that will be modeled with the ML potential. So if you call createMixedSystem()
, it should contain one element for each element of the atoms
argument, not one for each particle in the System. Can you make this clear both here and in the description of the atomTypes
argument below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing this. I have now clarified it. Now, atomTypes
is also an argument passed during system creation. This allows creating systems with varying ML regions from the same MLPotential in cases where custom nequip atom types are being used.
openmmml/models/nequippotential.py
Outdated
Parameters | ||
---------- | ||
name : str | ||
The name of the deployed model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the name that was specified in the MLPotential constructor, which in this case will always be nequip
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
openmmml/models/nequippotential.py
Outdated
typeNameToTypeIndex = { | ||
typeNames: i for i, typeNames in enumerate(typeNames) | ||
} | ||
self.atomTypes = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modifying the object this method is called on is dangerous. If you create a MLPotential and then create multiple Systems from it, this will lead to incorrect results on the second and later calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Fixed.
openmmml/models/nequippotential.py
Outdated
model : str | ||
The path to the deployed NequIP model. | ||
lengthScale : float | ||
The energy conversion factor from the model units to kJ/mol. | ||
energyScale : float | ||
The length conversion factor from the model units to nanometers. | ||
dtype : torch.dtype | ||
The precision of the model. | ||
r_max : torch.Tensor | ||
The maximum distance for the neighbor search. | ||
inputDict : dict | ||
The input dictionary passed to the model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't match the actual list of attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Since buffers can also be accessed as attributes, should those be included in the docstring?
openmmml/models/nequippotential.py
Outdated
self.model, metadata = nequip.scripts.deploy.load_deployed_model( | ||
self.modelPath, device="cpu", freeze=False | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is another instance of modifying self
inside a method that should treat it as immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
test/test_mlpotential.py
Outdated
|
||
|
||
@pytest.mark.parametrize("implementation,platform_int", list(itertools.product(['nnpops', 'torchani'], list(platform_ints)))) | ||
class TestMLPotential: | ||
|
||
def testCreateMixedSystem(self, implementation, platform_int): | ||
pdb = app.PDBFile('alanine-dipeptide-explicit.pdb') | ||
pdb = app.PDBFile(os.path.join(test_data_dir, 'alanine-dipeptide/alanine-dipeptide-explicit.pdb')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded unix path separator will fail on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
test/test_mlpotential.py
Outdated
test_data_dir = os.path.dirname(os.path.abspath(__file__)) | ||
test_data_dir = os.path.join(test_data_dir, "data") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first line is confusing: you set test_data_dir
to a directory that doesn't contain the test data. It's better to give it a different name, or just combine these two lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
examples/nequip/toluene.pdb
Outdated
@@ -0,0 +1,23 @@ | |||
HETATM 1 C1 UNL 1 2.199 -0.143 0.062 1.00 0.00 C |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correct residue name for toluene is MBN. See http://ligand-expo.rcsb.org/reports/M/MBN/index.html.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
examples/nequip/README.md
Outdated
conda install -c conda-forge openmm-torch nnpops | ||
``` | ||
|
||
Then install the development versions of NequIP and `openmm-ml` using pip: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't be telling people to install pre-release versions of packages unless there's a really good reason. Any release of OpenMM-ML that doesn't include NequIP support also doesn't include these examples, so we should always direct people to the latest release. Why is the pre-release NequIP needed, and when will the necessary features be in a release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I rewrote the instructions so that that the packages from the OpenMM ecosystem, viz. NNPOps
and openmm-ml
packages, are installed from conda-forge.
Regarding the pre-release of NequIP, the one currently available through pip is version 0.5.6, while pip install git+https://github.com/mir-group/nequip@develop
installs version 0.6.0. There's some discussion in this thread as to why the development version of NequIP might be better (or necessary) to use with this interface. I am sure @Linux-cpp-lisp is in a better position to provide more informed answers to your questions. @Linux-cpp-lisp could you please clarify why is the development version of NequiP required and whether there any plans to make it available through pip and/or conda? Many thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for coming around to this thread late, and thanks both for your efforts on this!
This is something I need to fix and hope to have fixed and available normally on PyPI in the very near future; I'll let you know as soon as I have that up. Since you are using load_deployed_model
, this will probably work with the current main
as well, but ideally I will just get that released and we will restrict to >=0.6.0
. Thanks for your flexibility while I clean this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, 0.6.0 is now available from PyPI: https://pypi.org/project/nequip/
Let me know if this resolves the issues for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I would recommend restricting the nequip
version to 0.6.0+ just for simplicity's sake)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks @Linux-cpp-lisp. That's really useful! I'll integrate it soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
simulation.minimizeEnergy() | ||
|
||
# Run the simulation | ||
simulation.step(1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before starting the simulation it's a good idea to call
simulation.context.setVelocitiesToTemperature(300*unit.kelvin)
Assume people will be copying your code, so we want to make sure it follows best practices. For the same reason, it's better to use a DCDReporter instead of a PDBReporter (PDB being a terrible format for trajectories).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
examples/nequip/run_nequip.ipynb
Outdated
"!mamba install -c conda-forge openmm-torch nnpops pytorch=*=cuda*\n", | ||
"\n", | ||
"!pip install git+https://github.com/mir-group/nequip@develop\n", | ||
"!pip install git+https://github.com/sef43/openmm-ml@nequip" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not have the example install OpenMM-ML from your personal fork! Remember that whatever you do in the example, users will copy it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. And thanks for the heads up, will keep that in mind :)
@JMorado
I get the following set of warnings and errors:
Here is my mamba list:
If I don't specify any platform, then the simulation runs, but extremely slowly since it is on CPU. Thank you so much in advance! Best, |
That means a plugin couldn't be loaded. Try printing the value of Usually it's because some library they depended on couldn't be found, and it can be fixed by adding the directory containing the library to |
Thank you for the quick response! I tried that based on some previous replies of yours that I found. I ran the following:
and the output was:
The failures command returned an empty tuple. Best, |
The versions of PyTorch and OpenMM-Torch you have installed are CPU only:
That might be because you have an older version of
If you upgrade it to 11.8, you might be able to get it to install the CUDA version of PyTorch. Conda installation issues like this tend to be frustrating and hard to figure out. They often depend on the precise order you install packages in. |
Ahhh I see. Thank you! I went ahead an uninstalled openmm-torch and pytorch. I upgraded the cudatoolkit, and then installed the cuda version of pytorch:
Installing openmm-torch downgraded it back to the cpu version, but then installing nnpops upgraded it back to the cuda version. I agree, conda installations are very frustrating. It is working on GPU now, but only getting about 0.2 ns/day, whereas on lammps I was getting 1.5 ns/day. To your knowledge, could any of the following warnings have to do with it being slow?
I tried to install the packages in such a way to allow me to use pytorch 1.11.0 (which according to the error is the most stable version with nequip), however, as far as I can tell there is no way to use pytorch 1.11.0 with openmm-torch. Every time I would install openmm-torch it would install pytorch 2.1.2. This is the order that I did everything:
|
Many thanks for the thorough review, @peastman! Most of it should be now resolved. Thanks for testing, @svarner9. I think the slow performance you're seeing is not related to that warning, the underlying issue of which is described here. You could test if the issue that underlies that warning is indeed present by identifying a slowdown in performance over time. I ran some performance benchmarks on systems much smaller than yours and did not see any decrease in performance over time, and the simulation speed is around what I would expect. If that is your baseline OpenMM performance, I wonder what could be causing that. Do you remember by any chance what was the performance you were getting with the previous neighbor list? Does anyone have any ideas about whether it's possible to improve performance here? |
Yes many thanks @peastman for the help! @JMorado I am not sure, but there are a few things I can think of that might be the issue, but I am not an expert and have not looked through the code, so it might be a bit naive.
Best, |
We actually do this internally, at least from Regarding the reproducibility of energies between ASE and OpenMM: you can try turning off TF32, or even better using a fully F64 model ( |
@svarner9 a few questions on performance:
|
There shouldn't be any overhead from Python. The model gets compiled to torchscript, and the simulation gets run by C++ code. |
Do you call TorchScript from Python here, or directly from C++? Not that I would expect a roundtrip through Python to matter much, just curious. |
It's called directly from C++. |
@peastman @Linux-cpp-lisp, I've trained a model with these settings: default_dtype: float64
model_dtype: float64
allow_tf32: true and the energy and force differences between ASE and OpenMM are indeed very small, on the order of |
@JMorado great! (Note that |
@Linux-cpp-lisp I was getting 1.5 ns/day on lammps and 0.2 ns/day on openmm for a system with 645 atoms. |
This PR adds in support for NequIP models to openmm-ml. There are no pre-trained models available but the model framework is well defined. This will allow users to use their own trained NequIP models in OpenMM simulations.
Also adds code to compute neighbor lists with pytorch that will be used for MACE models too. (NNPOps neighbor list can be added later)
Addresses #48 and see mir-group/nequip#288 for further discussion.
TODO: Need to add testing but not sure how to do this cleanly in CI considering NequIP needs to be installed via pip