-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Inconsistent behaviour with reindexed molecules #202
Comments
This is because the matching is done by
I'll have to think carefully about what to do here, since we need this to work in both cases. |
Actually, I think this is a more general issue with reindexing atoms in Sire. There appear to be some inconsistencies depending on whether properties are handled at the atom or molecule level. For example, consider this example script. The input is import sire as sr
mol = sr.load([f"tyk2_ejm_55.gro", f"tyk2_ejm_55.top"], show_warnings=False)[0]
print("Initial molecule:")
for atoms in mol.atoms():
print(atoms)
print(mol.property("coordinates"))
print()
sr.save(mol, "poo.prm7")
indices = [x for x in range(33)]
reordered_indices = [
1,
0,
2,
32,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
30,
15,
29,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
31,
]
names = mol.atoms().names()
mol = mol.edit()
for new, old in enumerate(reordered_indices):
atom = mol.atom(names[old])
atom = atom.reindex(new)
mol = atom.molecule()
mol = mol.commit()
print("Reindexed molecule:")
for atoms in mol.atoms():
print(atoms)
print(mol.property("coordinates"))
print()
sr.save(mol, "debug.grotop")
sr.save(mol, "debug.gro87")
try:
gro_mol = sr.load("debug.gro87", "debug.grotop", show_warnings=False)[0]
except:
print("Loading from GROMACS format failed!")
print("GROMACS molecule:")
for atoms in mol.atoms():
print(atoms)
print(mol.property("coordinates"))
print()
sr.save(mol, "debug.rst7")
sr.save(mol, "debug.prm7")
try:
amber_mol = sr.load("debug.rst7", "debug.prm7", show_warnings=False)[0]
except:
print("Loading from AMBER format failed!")
print()
edit_mol = mol.edit()
new_mol = (
edit_mol.set_property("coordinates", gro_mol.property("coordinates"))
.molecule()
.commit()
)
print("Reindex with copied coordinates:")
for atoms in mol.atoms():
print(atoms)
print(mol.property("coordinates"))
print()
sr.save(new_mol, "debug2.gro87") This script does a few things:
The output is as follows:
Going from steps 1) to 2) you can see that the atoms are reordered. However, the coordinates property (as obtained at the molecule level) is in the same order. Next we find that it is possible to save and load using GROMACS format, but not with AMBER. Here the exception is: OSError: SireError::io_error: Cannot load the molecules: Disagreement over the number of atoms for molecule at index 0. 33 versus 32. Number
of atoms in neighbouring molecules are { molidx 0: natoms = 33 } (call sire.error.get_last_error_details() for more info) Finally, when we copy the atoms from the GROMACS molecule into the reindexed one they appear to be correct when looking at the molecule property. However, if we compare the two GROMACS topology files, i.e.
The atom names are in the correct reindexed order, but it appears that the coordinates are in a completely different order. Here the order of the coordinates in
|
Just to clarify, if I modify the script above so that the coordiinates are copied at the atom level, i.e.: edit_mol = mol.edit()
for atom in gro_mol.atoms():
edit_mol = (
edit_mol.atom(atom.index())
.set_property("coordinates", atom.property("coordinates"))
.molecule()
)
new_mol = edit_mol.commit() then everything is fine:
|
Ok - I can see the problem. It is because atom-based properties are stored in containers that use the CGAtomIdx (combination of the CutGroup index and Index within the CutGroup) for the atom. Changing the AtomIdx does not change the CGAtomIdx. Indeed, all it does is update the mapping from AtomIdx to CGAtomIdx, as CGAtomIdx is the underlying, central indexing scheme. You can see this with a smaller example
You can see the atom "HA" has correctly been moved to AtomIdx 0, and is printed first. However, it still keeps its original CGAtomIdx of 1,3. This means that its atom properties are in the fourth element of the second cutgroup.
When you access an atom property via the atom interface, it does all the mapping from AtomIdx to CGAtomIdx behind the scenes, and returns the right property value. However, if you access the property by molecule, then you see the values arranged by CGAtomIdx, i.e. the property values are NOT in AtomIdx order, they are in CGAtomIdx order. The problem is that some parts of the code are getting confused between AtomIdx order and CGAtomIdx order... It will require some time and thought to fix. I suspect we may need a check that sees if the CGAtomIdx order matches the AtomIdx order, and if not, a flag that reorders the data when it is copied across. |
One possible fix is to do the reordering when the molecule is committed after reindexing. That would remove the need for code to test things in lots of places. But, CutGroups really encode the concept of residues, so reindexing in a way that fragments residues (makes them discontinuous) would be messy. Possibly, in this case, we could split the residue into separate CutGroups and issue a warning that this has changed. This would only impact energies calculate with residue-based cutoffs, which aren't that common any more (MD is all atom-based). So it wouldn't be a terrible fix. |
I've made some progress. I've added code so that the cutgroups are reordered on |
For some reason, if a molecule has been reindexed, i.e. the atom order changed, calling
updateCoordinatesAndVelocities
using that molecule as a reference will cause the re-ordered coordinates from the new system to be copied back in the original, pre reindexing, order.The text was updated successfully, but these errors were encountered: