Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Inconsistent behaviour with reindexed molecules #202

Closed
lohedges opened this issue Jun 14, 2024 · 6 comments · Fixed by #205
Closed

[BUG] Inconsistent behaviour with reindexed molecules #202

lohedges opened this issue Jun 14, 2024 · 6 comments · Fixed by #205
Assignees
Labels
bug Something isn't working exscientia Related to work with Exscientia
Milestone

Comments

@lohedges
Copy link
Contributor

For some reason, if a molecule has been reindexed, i.e. the atom order changed, calling updateCoordinatesAndVelocities using that molecule as a reference will cause the re-ordered coordinates from the new system to be copied back in the original, pre reindexing, order.

@lohedges lohedges added bug Something isn't working exscientia Related to work with Exscientia labels Jun 14, 2024
@lohedges lohedges self-assigned this Jun 14, 2024
@lohedges
Copy link
Contributor Author

This is because the matching is done by AtomNum, not AtomIdx. This is needed, since atoms can be reordered by the engine:

In [28]: for atom0, atom1 in zip(ss[0]._sire_object.atoms(), sss[0].atoms()):
    ...:     print(atom0.number(), atom1.number())
    ...:
AtomNum(1) AtomNum(2)
AtomNum(2) AtomNum(1)
AtomNum(3) AtomNum(3)
AtomNum(4) AtomNum(33)
AtomNum(5) AtomNum(4)
AtomNum(6) AtomNum(5)
AtomNum(7) AtomNum(6)
AtomNum(8) AtomNum(7)
AtomNum(9) AtomNum(8)
AtomNum(10) AtomNum(9)
AtomNum(11) AtomNum(10)
AtomNum(12) AtomNum(11)
AtomNum(13) AtomNum(12)
AtomNum(14) AtomNum(13)
AtomNum(15) AtomNum(14)
AtomNum(16) AtomNum(15)
AtomNum(17) AtomNum(31)
AtomNum(18) AtomNum(16)
AtomNum(19) AtomNum(30)
AtomNum(20) AtomNum(17)
AtomNum(21) AtomNum(18)
AtomNum(22) AtomNum(19)
AtomNum(23) AtomNum(20)
AtomNum(24) AtomNum(21)
AtomNum(25) AtomNum(22)
AtomNum(26) AtomNum(23)
AtomNum(27) AtomNum(24)
AtomNum(28) AtomNum(25)
AtomNum(29) AtomNum(26)
AtomNum(30) AtomNum(27)
AtomNum(31) AtomNum(28)
AtomNum(32) AtomNum(29)
AtomNum(33) AtomNum(32)

I'll have to think carefully about what to do here, since we need this to work in both cases.

@lohedges
Copy link
Contributor Author

Actually, I think this is a more general issue with reindexing atoms in Sire. There appear to be some inconsistencies depending on whether properties are handled at the atom or molecule level. For example, consider this example script. The input is
here.

import sire as sr

mol = sr.load([f"tyk2_ejm_55.gro", f"tyk2_ejm_55.top"], show_warnings=False)[0]

print("Initial molecule:")
for atoms in mol.atoms():
    print(atoms)
print(mol.property("coordinates"))
print()

sr.save(mol, "poo.prm7")

indices = [x for x in range(33)]

reordered_indices = [
    1,
    0,
    2,
    32,
    3,
    4,
    5,
    6,
    7,
    8,
    9,
    10,
    11,
    12,
    13,
    14,
    30,
    15,
    29,
    16,
    17,
    18,
    19,
    20,
    21,
    22,
    23,
    24,
    25,
    26,
    27,
    28,
    31,
]

names = mol.atoms().names()
mol = mol.edit()
for new, old in enumerate(reordered_indices):
    atom = mol.atom(names[old])
    atom = atom.reindex(new)
    mol = atom.molecule()
mol = mol.commit()

print("Reindexed molecule:")
for atoms in mol.atoms():
    print(atoms)
print(mol.property("coordinates"))
print()

sr.save(mol, "debug.grotop")
sr.save(mol, "debug.gro87")

try:
    gro_mol = sr.load("debug.gro87", "debug.grotop", show_warnings=False)[0]
except:
    print("Loading from GROMACS format failed!")

print("GROMACS molecule:")
for atoms in mol.atoms():
    print(atoms)
print(mol.property("coordinates"))
print()

sr.save(mol, "debug.rst7")
sr.save(mol, "debug.prm7")

try:
    amber_mol = sr.load("debug.rst7", "debug.prm7", show_warnings=False)[0]
except:
    print("Loading from AMBER format failed!")
    print()

edit_mol = mol.edit()
new_mol = (
    edit_mol.set_property("coordinates", gro_mol.property("coordinates"))
    .molecule()
    .commit()
)

print("Reindex with copied coordinates:")
for atoms in mol.atoms():
    print(atoms)
print(mol.property("coordinates"))
print()

sr.save(new_mol, "debug2.gro87")

This script does a few things:

  1. Loads a molecule and prints the atoms and coordinates property.
  2. Reindexes the atoms in the molecule, then prints the updated atoms and coordinates (extracting the property from the molecule).
  3. Saves the reindexed molecule to GROMACS format and tries to reload. If successful, prints the atoms and coordinates property.
  4. Saves the reindexed molecule to AMBER format and tries to reload.
  5. Copies the coordinates property from the GROMACS molecule into the reindexed molecule, then prints the atoms and coordinates.

The output is as follows:

Initial molecule:
Atom( H1x:1   [  -4.76,   -2.84,  -16.51] )
Atom( C1x:2   [  -5.35,   -3.70,  -16.23] )
Atom( C2x:3   [  -6.71,   -3.55,  -15.91] )
Atom( C3x:4   [  -7.48,   -4.67,  -15.53] )
Atom( C4x:5   [  -6.89,   -5.96,  -15.48] )
Atom( C5x:6   [  -5.53,   -6.11,  -15.85] )
Atom( C6x:7   [  -4.77,   -4.98,  -16.21] )
Atom( H2x:8   [  -3.72,   -5.09,  -16.47] )
Atom( Cl1x:9  [  -4.76,   -7.65,  -15.87] )
Atom( C7x:10  [  -7.70,   -7.14,  -15.03] )
Atom( O1x:11  [  -8.11,   -7.21,  -13.88] )
Atom( N1x:12  [  -7.90,   -8.05,  -16.00] )
Atom( H3x:13  [  -7.44,   -7.87,  -16.88] )
Atom( C8x:14  [  -8.73,   -9.21,  -16.00] )
Atom( C9x:15  [  -9.58,   -9.58,  -14.94] )
Atom( C10x:16 [ -10.38,  -10.72,  -15.11] )
Atom( N2x:17  [ -10.36,  -11.49,  -16.22] )
Atom( C11x:18 [  -9.54,  -11.16,  -17.24] )
Atom( C12x:19 [  -8.71,  -10.01,  -17.16] )
Atom( H4x:20  [  -8.08,   -9.71,  -17.98] )
Atom( N3x:21  [  -9.56,  -11.96,  -18.41] )
Atom( H5x:22  [ -10.37,  -12.56,  -18.52] )
Atom( C13x:23 [  -8.65,  -12.02,  -19.46] )
Atom( O2x:24  [  -7.59,  -11.41,  -19.59] )
Atom( O3x:25  [  -9.13,  -12.90,  -20.35] )
Atom( C14x:26 [  -8.44,  -13.13,  -21.57] )
Atom( H6x:27  [  -8.96,  -13.88,  -22.16] )
Atom( H7x:28  [  -8.38,  -12.22,  -22.16] )
Atom( H8x:29  [  -7.43,  -13.49,  -21.38] )
Atom( H9x:30  [ -11.07,  -11.02,  -14.33] )
Atom( H10x:31 [  -9.67,   -9.02,  -14.03] )
Atom( Cl2x:32 [  -9.16,   -4.42,  -15.16] )
Atom( H11x:33 [  -7.17,   -2.57,  -15.94] )
AtomCoords( size=33
0: (-4.76, -2.84, -16.51)
1: (-5.3500000000000005, -3.7, -16.23)
2: (-6.710000000000001, -3.55, -15.91)
3: (-7.48, -4.67, -15.53)
4: (-6.89, -5.96, -15.48)
...
28: (-7.43, -13.49, -21.38)
29: (-11.07, -11.020000000000001, -14.33)
30: (-9.67, -9.02, -14.030000000000001)
31: (-9.16, -4.42, -15.16)
32: (-7.17, -2.5700000000000003, -15.940000000000001)
)

Reindexed molecule:
Atom( C1x:2   [  -5.35,   -3.70,  -16.23] )
Atom( H1x:1   [  -4.76,   -2.84,  -16.51] )
Atom( C2x:3   [  -6.71,   -3.55,  -15.91] )
Atom( H11x:33 [  -7.17,   -2.57,  -15.94] )
Atom( C3x:4   [  -7.48,   -4.67,  -15.53] )
Atom( C4x:5   [  -6.89,   -5.96,  -15.48] )
Atom( C5x:6   [  -5.53,   -6.11,  -15.85] )
Atom( C6x:7   [  -4.77,   -4.98,  -16.21] )
Atom( H2x:8   [  -3.72,   -5.09,  -16.47] )
Atom( Cl1x:9  [  -4.76,   -7.65,  -15.87] )
Atom( C7x:10  [  -7.70,   -7.14,  -15.03] )
Atom( O1x:11  [  -8.11,   -7.21,  -13.88] )
Atom( N1x:12  [  -7.90,   -8.05,  -16.00] )
Atom( H3x:13  [  -7.44,   -7.87,  -16.88] )
Atom( C8x:14  [  -8.73,   -9.21,  -16.00] )
Atom( C9x:15  [  -9.58,   -9.58,  -14.94] )
Atom( H10x:31 [  -9.67,   -9.02,  -14.03] )
Atom( C10x:16 [ -10.38,  -10.72,  -15.11] )
Atom( H9x:30  [ -11.07,  -11.02,  -14.33] )
Atom( N2x:17  [ -10.36,  -11.49,  -16.22] )
Atom( C11x:18 [  -9.54,  -11.16,  -17.24] )
Atom( C12x:19 [  -8.71,  -10.01,  -17.16] )
Atom( H4x:20  [  -8.08,   -9.71,  -17.98] )
Atom( N3x:21  [  -9.56,  -11.96,  -18.41] )
Atom( H5x:22  [ -10.37,  -12.56,  -18.52] )
Atom( C13x:23 [  -8.65,  -12.02,  -19.46] )
Atom( O2x:24  [  -7.59,  -11.41,  -19.59] )
Atom( O3x:25  [  -9.13,  -12.90,  -20.35] )
Atom( C14x:26 [  -8.44,  -13.13,  -21.57] )
Atom( H6x:27  [  -8.96,  -13.88,  -22.16] )
Atom( H7x:28  [  -8.38,  -12.22,  -22.16] )
Atom( H8x:29  [  -7.43,  -13.49,  -21.38] )
Atom( Cl2x:32 [  -9.16,   -4.42,  -15.16] )
AtomCoords( size=33
0: (-4.76, -2.84, -16.51)
1: (-5.3500000000000005, -3.7, -16.23)
2: (-6.710000000000001, -3.55, -15.91)
3: (-7.48, -4.67, -15.53)
4: (-6.89, -5.96, -15.48)
...
28: (-7.43, -13.49, -21.38)
29: (-11.07, -11.020000000000001, -14.33)
30: (-9.67, -9.02, -14.030000000000001)
31: (-9.16, -4.42, -15.16)
32: (-7.17, -2.5700000000000003, -15.940000000000001)
)

GROMACS molecule:
Atom( C1x:2   [  -5.35,   -3.70,  -16.23] )
Atom( H1x:1   [  -4.76,   -2.84,  -16.51] )
Atom( C2x:3   [  -6.71,   -3.55,  -15.91] )
Atom( H11x:33 [  -7.17,   -2.57,  -15.94] )
Atom( C3x:4   [  -7.48,   -4.67,  -15.53] )
Atom( C4x:5   [  -6.89,   -5.96,  -15.48] )
Atom( C5x:6   [  -5.53,   -6.11,  -15.85] )
Atom( C6x:7   [  -4.77,   -4.98,  -16.21] )
Atom( H2x:8   [  -3.72,   -5.09,  -16.47] )
Atom( Cl1x:9  [  -4.76,   -7.65,  -15.87] )
Atom( C7x:10  [  -7.70,   -7.14,  -15.03] )
Atom( O1x:11  [  -8.11,   -7.21,  -13.88] )
Atom( N1x:12  [  -7.90,   -8.05,  -16.00] )
Atom( H3x:13  [  -7.44,   -7.87,  -16.88] )
Atom( C8x:14  [  -8.73,   -9.21,  -16.00] )
Atom( C9x:15  [  -9.58,   -9.58,  -14.94] )
Atom( H10x:31 [  -9.67,   -9.02,  -14.03] )
Atom( C10x:16 [ -10.38,  -10.72,  -15.11] )
Atom( H9x:30  [ -11.07,  -11.02,  -14.33] )
Atom( N2x:17  [ -10.36,  -11.49,  -16.22] )
Atom( C11x:18 [  -9.54,  -11.16,  -17.24] )
Atom( C12x:19 [  -8.71,  -10.01,  -17.16] )
Atom( H4x:20  [  -8.08,   -9.71,  -17.98] )
Atom( N3x:21  [  -9.56,  -11.96,  -18.41] )
Atom( H5x:22  [ -10.37,  -12.56,  -18.52] )
Atom( C13x:23 [  -8.65,  -12.02,  -19.46] )
Atom( O2x:24  [  -7.59,  -11.41,  -19.59] )
Atom( O3x:25  [  -9.13,  -12.90,  -20.35] )
Atom( C14x:26 [  -8.44,  -13.13,  -21.57] )
Atom( H6x:27  [  -8.96,  -13.88,  -22.16] )
Atom( H7x:28  [  -8.38,  -12.22,  -22.16] )
Atom( H8x:29  [  -7.43,  -13.49,  -21.38] )
Atom( Cl2x:32 [  -9.16,   -4.42,  -15.16] )
AtomCoords( size=33
0: (-4.76, -2.84, -16.51)
1: (-5.3500000000000005, -3.7, -16.23)
2: (-6.710000000000001, -3.55, -15.91)
3: (-7.48, -4.67, -15.53)
4: (-6.89, -5.96, -15.48)
...
28: (-7.43, -13.49, -21.38)
29: (-11.07, -11.020000000000001, -14.33)
30: (-9.67, -9.02, -14.030000000000001)
31: (-9.16, -4.42, -15.16)
32: (-7.17, -2.5700000000000003, -15.940000000000001)
)

Loading from AMBER format failed!

Reindex with copied coordinates:
Atom( C1x:2   [  -5.35,   -3.70,  -16.23] )
Atom( H1x:1   [  -4.76,   -2.84,  -16.51] )
Atom( C2x:3   [  -6.71,   -3.55,  -15.91] )
Atom( H11x:33 [  -7.17,   -2.57,  -15.94] )
Atom( C3x:4   [  -7.48,   -4.67,  -15.53] )
Atom( C4x:5   [  -6.89,   -5.96,  -15.48] )
Atom( C5x:6   [  -5.53,   -6.11,  -15.85] )
Atom( C6x:7   [  -4.77,   -4.98,  -16.21] )
Atom( H2x:8   [  -3.72,   -5.09,  -16.47] )
Atom( Cl1x:9  [  -4.76,   -7.65,  -15.87] )
Atom( C7x:10  [  -7.70,   -7.14,  -15.03] )
Atom( O1x:11  [  -8.11,   -7.21,  -13.88] )
Atom( N1x:12  [  -7.90,   -8.05,  -16.00] )
Atom( H3x:13  [  -7.44,   -7.87,  -16.88] )
Atom( C8x:14  [  -8.73,   -9.21,  -16.00] )
Atom( C9x:15  [  -9.58,   -9.58,  -14.94] )
Atom( H10x:31 [  -9.67,   -9.02,  -14.03] )
Atom( C10x:16 [ -10.38,  -10.72,  -15.11] )
Atom( H9x:30  [ -11.07,  -11.02,  -14.33] )
Atom( N2x:17  [ -10.36,  -11.49,  -16.22] )
Atom( C11x:18 [  -9.54,  -11.16,  -17.24] )
Atom( C12x:19 [  -8.71,  -10.01,  -17.16] )
Atom( H4x:20  [  -8.08,   -9.71,  -17.98] )
Atom( N3x:21  [  -9.56,  -11.96,  -18.41] )
Atom( H5x:22  [ -10.37,  -12.56,  -18.52] )
Atom( C13x:23 [  -8.65,  -12.02,  -19.46] )
Atom( O2x:24  [  -7.59,  -11.41,  -19.59] )
Atom( O3x:25  [  -9.13,  -12.90,  -20.35] )
Atom( C14x:26 [  -8.44,  -13.13,  -21.57] )
Atom( H6x:27  [  -8.96,  -13.88,  -22.16] )
Atom( H7x:28  [  -8.38,  -12.22,  -22.16] )
Atom( H8x:29  [  -7.43,  -13.49,  -21.38] )
Atom( Cl2x:32 [  -9.16,   -4.42,  -15.16] )
AtomCoords( size=33
0: (-4.76, -2.84, -16.51)
1: (-5.3500000000000005, -3.7, -16.23)
2: (-6.710000000000001, -3.55, -15.91)
3: (-7.48, -4.67, -15.53)
4: (-6.89, -5.96, -15.48)
...
28: (-7.43, -13.49, -21.38)
29: (-11.07, -11.020000000000001, -14.33)
30: (-9.67, -9.02, -14.030000000000001)
31: (-9.16, -4.42, -15.16)
32: (-7.17, -2.5700000000000003, -15.940000000000001)
)

Going from steps 1) to 2) you can see that the atoms are reordered. However, the coordinates property (as obtained at the molecule level) is in the same order.

Next we find that it is possible to save and load using GROMACS format, but not with AMBER. Here the exception is:

OSError: SireError::io_error: Cannot load the molecules: Disagreement over the number of atoms for molecule at index 0. 33 versus 32. Number
of atoms in neighbouring molecules are { molidx 0: natoms = 33 } (call sire.error.get_last_error_details() for more info)

Finally, when we copy the atoms from the GROMACS molecule into the reindexed one they appear to be correct when looking at the molecule property. However, if we compare the two GROMACS topology files, i.e. debug.gro87 and debug2.gro87 we find that the are different.

debug.gro87:

   33
    1UNK    C1x    1  -0.535000  -0.370000  -1.623000
    1UNK    H1x    2  -0.476000  -0.284000  -1.651000
    1UNK    C2x    3  -0.671000  -0.355000  -1.591000
    1UNK   H11x    4  -0.717000  -0.257000  -1.594000
    1UNK    C3x    5  -0.748000  -0.467000  -1.553000
    1UNK    C4x    6  -0.689000  -0.596000  -1.548000
    1UNK    C5x    7  -0.553000  -0.611000  -1.585000
    1UNK    C6x    8  -0.477000  -0.498000  -1.621000
    1UNK    H2x    9  -0.372000  -0.509000  -1.647000
    1UNK   Cl1x   10  -0.476000  -0.765000  -1.587000
    1UNK    C7x   11  -0.770000  -0.714000  -1.503000
    1UNK    O1x   12  -0.811000  -0.721000  -1.388000
    1UNK    N1x   13  -0.790000  -0.805000  -1.600000
    1UNK    H3x   14  -0.744000  -0.787000  -1.688000
    1UNK    C8x   15  -0.873000  -0.921000  -1.600000
    1UNK    C9x   16  -0.958000  -0.958000  -1.494000
    1UNK   H10x   17  -0.967000  -0.902000  -1.403000
    1UNK   C10x   18  -1.038000  -1.072000  -1.511000
    1UNK    H9x   19  -1.107000  -1.102000  -1.433000
    1UNK    N2x   20  -1.036000  -1.149000  -1.622000
    1UNK   C11x   21  -0.954000  -1.116000  -1.724000
    1UNK   C12x   22  -0.871000  -1.001000  -1.716000
    1UNK    H4x   23  -0.808000  -0.971000  -1.798000
    1UNK    N3x   24  -0.956000  -1.196000  -1.841000
    1UNK    H5x   25  -1.037000  -1.256000  -1.852000
    1UNK   C13x   26  -0.865000  -1.202000  -1.946000
    1UNK    O2x   27  -0.759000  -1.141000  -1.959000
    1UNK    O3x   28  -0.913000  -1.290000  -2.035000
    1UNK   C14x   29  -0.844000  -1.313000  -2.157000
    1UNK    H6x   30  -0.896000  -1.388000  -2.216000
    1UNK    H7x   31  -0.838000  -1.222000  -2.216000
    1UNK    H8x   32  -0.743000  -1.349000  -2.138000
    1UNK   Cl2x   33  -0.916000  -0.442000  -1.516000
   0.00000   0.00000   0.00000

debug2.gro87:

   33
    1UNK    C1x    1  -0.476000  -0.284000  -1.651000
    1UNK    H1x    2  -0.535000  -0.370000  -1.623000
    1UNK    C2x    3  -0.671000  -0.355000  -1.591000
    1UNK   H11x    4  -0.916000  -0.442000  -1.516000
    1UNK    C3x    5  -0.717000  -0.257000  -1.594000
    1UNK    C4x    6  -0.748000  -0.467000  -1.553000
    1UNK    C5x    7  -0.689000  -0.596000  -1.548000
    1UNK    C6x    8  -0.553000  -0.611000  -1.585000
    1UNK    H2x    9  -0.477000  -0.498000  -1.621000
    1UNK   Cl1x   10  -0.372000  -0.509000  -1.647000
    1UNK    C7x   11  -0.476000  -0.765000  -1.587000
    1UNK    O1x   12  -0.770000  -0.714000  -1.503000
    1UNK    N1x   13  -0.811000  -0.721000  -1.388000
    1UNK    H3x   14  -0.790000  -0.805000  -1.600000
    1UNK    C8x   15  -0.744000  -0.787000  -1.688000
    1UNK    C9x   16  -0.873000  -0.921000  -1.600000
    1UNK   H10x   17  -0.838000  -1.222000  -2.216000
    1UNK   C10x   18  -0.958000  -0.958000  -1.494000
    1UNK    H9x   19  -0.896000  -1.388000  -2.216000
    1UNK    N2x   20  -0.967000  -0.902000  -1.403000
    1UNK   C11x   21  -1.038000  -1.072000  -1.511000
    1UNK   C12x   22  -1.107000  -1.102000  -1.433000
    1UNK    H4x   23  -1.036000  -1.149000  -1.622000
    1UNK    N3x   24  -0.954000  -1.116000  -1.724000
    1UNK    H5x   25  -0.871000  -1.001000  -1.716000
    1UNK   C13x   26  -0.808000  -0.971000  -1.798000
    1UNK    O2x   27  -0.956000  -1.196000  -1.841000
    1UNK    O3x   28  -1.037000  -1.256000  -1.852000
    1UNK   C14x   29  -0.865000  -1.202000  -1.946000
    1UNK    H6x   30  -0.759000  -1.141000  -1.959000
    1UNK    H7x   31  -0.913000  -1.290000  -2.035000
    1UNK    H8x   32  -0.844000  -1.313000  -2.157000
    1UNK   Cl2x   33  -0.743000  -1.349000  -2.138000
   0.00000   0.00000   0.00000

The atom names are in the correct reindexed order, but it appears that the coordinates are in a completely different order. Here the order of the coordinates in debug2.gro87 appears to have had the mapping applied twice, i.e. the first four entries have the coordinates from entries 1,0, 2, and 32 from debug.gro87:

    1UNK    H1x    2  -0.476000  -0.284000  -1.651000
    1UNK    C1x    1  -0.535000  -0.370000  -1.623000
    1UNK    C2x    3  -0.671000  -0.355000  -1.591000
    1UNK   Cl2x   33  -0.916000  -0.442000  -1.516000

@lohedges lohedges changed the title [BUG] sire.legacy.IO.updateCoordinatesAndVelocities doesn't use the correct order when atoms have been re-indexed [BUG] Inconsistent behaviour with reindexed molecules Jun 14, 2024
@lohedges
Copy link
Contributor Author

Just to clarify, if I modify the script above so that the coordiinates are copied at the atom level, i.e.:

edit_mol = mol.edit()
for atom in gro_mol.atoms():
    edit_mol = (
        edit_mol.atom(atom.index())
        .set_property("coordinates", atom.property("coordinates"))
        .molecule()
    )
new_mol = edit_mol.commit()

then everything is fine:

debug2.gro87:

   33
    1UNK    C1x    1  -0.535000  -0.370000  -1.623000
    1UNK    H1x    2  -0.476000  -0.284000  -1.651000
    1UNK    C2x    3  -0.671000  -0.355000  -1.591000
    1UNK   H11x    4  -0.717000  -0.257000  -1.594000
    1UNK    C3x    5  -0.748000  -0.467000  -1.553000
    1UNK    C4x    6  -0.689000  -0.596000  -1.548000
    1UNK    C5x    7  -0.553000  -0.611000  -1.585000
    1UNK    C6x    8  -0.477000  -0.498000  -1.621000
    1UNK    H2x    9  -0.372000  -0.509000  -1.647000
    1UNK   Cl1x   10  -0.476000  -0.765000  -1.587000
    1UNK    C7x   11  -0.770000  -0.714000  -1.503000
    1UNK    O1x   12  -0.811000  -0.721000  -1.388000
    1UNK    N1x   13  -0.790000  -0.805000  -1.600000
    1UNK    H3x   14  -0.744000  -0.787000  -1.688000
    1UNK    C8x   15  -0.873000  -0.921000  -1.600000
    1UNK    C9x   16  -0.958000  -0.958000  -1.494000
    1UNK   H10x   17  -0.967000  -0.902000  -1.403000
    1UNK   C10x   18  -1.038000  -1.072000  -1.511000
    1UNK    H9x   19  -1.107000  -1.102000  -1.433000
    1UNK    N2x   20  -1.036000  -1.149000  -1.622000
    1UNK   C11x   21  -0.954000  -1.116000  -1.724000
    1UNK   C12x   22  -0.871000  -1.001000  -1.716000
    1UNK    H4x   23  -0.808000  -0.971000  -1.798000
    1UNK    N3x   24  -0.956000  -1.196000  -1.841000
    1UNK    H5x   25  -1.037000  -1.256000  -1.852000
    1UNK   C13x   26  -0.865000  -1.202000  -1.946000
    1UNK    O2x   27  -0.759000  -1.141000  -1.959000
    1UNK    O3x   28  -0.913000  -1.290000  -2.035000
    1UNK   C14x   29  -0.844000  -1.313000  -2.157000
    1UNK    H6x   30  -0.896000  -1.388000  -2.216000
    1UNK    H7x   31  -0.838000  -1.222000  -2.216000
    1UNK    H8x   32  -0.743000  -1.349000  -2.138000
    1UNK   Cl2x   33  -0.916000  -0.442000  -1.516000
   0.00000   0.00000   0.00000

@chryswoods
Copy link
Contributor

Ok - I can see the problem. It is because atom-based properties are stored in containers that use the CGAtomIdx (combination of the CutGroup index and Index within the CutGroup) for the atom. Changing the AtomIdx does not change the CGAtomIdx. Indeed, all it does is update the mapping from AtomIdx to CGAtomIdx, as CGAtomIdx is the underlying, central indexing scheme.

You can see this with a smaller example

>>> import sire as sr
>>> mol = sr.load_test_files("ala.top", "ala.crd")[0]
>>> for atom in mol.atoms():
...     print(atom.name(), atom.index(), atom.cg_atom_idx(),
...           atom.coordinates())
AtomName('HH31') AtomIdx(0) {CGIdx(0),Index(0)} ( 18.4532 Å, 3.49423 Å, 12.4365 Å )
AtomName('CH3') AtomIdx(1) {CGIdx(0),Index(1)} ( 18.9818 Å, 3.44823 Å, 13.3886 Å )
AtomName('HH32') AtomIdx(2) {CGIdx(0),Index(2)} ( 20.0513 Å, 3.63293 Å, 13.2874 Å )
AtomName('HH33') AtomIdx(3) {CGIdx(0),Index(3)} ( 18.798 Å, 2.43076 Å, 13.7337 Å )
AtomName('C') AtomIdx(4) {CGIdx(0),Index(4)} ( 18.4805 Å, 4.54971 Å, 14.3514 Å )
AtomName('O') AtomIdx(5) {CGIdx(0),Index(5)} ( 19.1866 Å, 5.44143 Å, 14.7584 Å )
AtomName('N') AtomIdx(6) {CGIdx(1),Index(0)} ( 17.2214 Å, 4.31498 Å, 14.7128 Å )
AtomName('H') AtomIdx(7) {CGIdx(1),Index(1)} ( 16.6815 Å, 3.61576 Å, 14.2233 Å )
AtomName('CA') AtomIdx(8) {CGIdx(1),Index(2)} ( 16.5371 Å, 5.02707 Å, 15.812 Å )
AtomName('HA') AtomIdx(9) {CGIdx(1),Index(3)} ( 17.2875 Å, 5.15172 Å, 16.5926 Å )
AtomName('CB') AtomIdx(10) {CGIdx(1),Index(4)} ( 16.0464 Å, 6.38937 Å, 15.2588 Å )
AtomName('HB1') AtomIdx(11) {CGIdx(1),Index(5)} ( 15.63 Å, 6.98009 Å, 16.0747 Å )
AtomName('HB2') AtomIdx(12) {CGIdx(1),Index(6)} ( 16.8965 Å, 6.89072 Å, 14.7962 Å )
AtomName('HB3') AtomIdx(13) {CGIdx(1),Index(7)} ( 15.2415 Å, 6.18045 Å, 14.5541 Å )
AtomName('C') AtomIdx(14) {CGIdx(1),Index(8)} ( 15.3698 Å, 4.19397 Å, 16.434 Å )
AtomName('O') AtomIdx(15) {CGIdx(1),Index(9)} ( 14.9427 Å, 3.16814 Å, 15.8751 Å )
AtomName('N') AtomIdx(16) {CGIdx(2),Index(0)} ( 14.9657 Å, 4.59088 Å, 17.5824 Å )
AtomName('H') AtomIdx(17) {CGIdx(2),Index(1)} ( 15.3407 Å, 5.44815 Å, 17.9626 Å )
AtomName('CH3') AtomIdx(18) {CGIdx(2),Index(2)} ( 13.8341 Å, 3.93668 Å, 18.3509 Å )
AtomName('HH31') AtomIdx(19) {CGIdx(2),Index(3)} ( 14.3525 Å, 3.40994 Å, 19.1521 Å )
AtomName('HH32') AtomIdx(20) {CGIdx(2),Index(4)} ( 13.1933 Å, 4.59022 Å, 18.9428 Å )
AtomName('HH33') AtomIdx(21) {CGIdx(2),Index(5)} ( 13.2149 Å, 3.33301 Å, 17.6874 Å )

>>> mol = mol.edit()
>>> atom = mol.atom("HA")
>>> atom = atom.reindex(0)
>>> mol = atom.molecule().commit()

>>> for atom in mol.atoms():
...     print(atom.name(), atom.index(), atom.cg_atom_idx(),
...           atom.coordinates())
AtomName('HA') AtomIdx(0) {CGIdx(1),Index(3)} ( 17.2875 Å, 5.15172 Å, 16.5926 Å )
AtomName('HH31') AtomIdx(1) {CGIdx(0),Index(0)} ( 18.4532 Å, 3.49423 Å, 12.4365 Å )
AtomName('CH3') AtomIdx(2) {CGIdx(0),Index(1)} ( 18.9818 Å, 3.44823 Å, 13.3886 Å )
AtomName('HH32') AtomIdx(3) {CGIdx(0),Index(2)} ( 20.0513 Å, 3.63293 Å, 13.2874 Å )
AtomName('HH33') AtomIdx(4) {CGIdx(0),Index(3)} ( 18.798 Å, 2.43076 Å, 13.7337 Å )
AtomName('C') AtomIdx(5) {CGIdx(0),Index(4)} ( 18.4805 Å, 4.54971 Å, 14.3514 Å )
AtomName('O') AtomIdx(6) {CGIdx(0),Index(5)} ( 19.1866 Å, 5.44143 Å, 14.7584 Å )
AtomName('N') AtomIdx(7) {CGIdx(1),Index(0)} ( 17.2214 Å, 4.31498 Å, 14.7128 Å )
AtomName('H') AtomIdx(8) {CGIdx(1),Index(1)} ( 16.6815 Å, 3.61576 Å, 14.2233 Å )
AtomName('CA') AtomIdx(9) {CGIdx(1),Index(2)} ( 16.5371 Å, 5.02707 Å, 15.812 Å )
AtomName('CB') AtomIdx(10) {CGIdx(1),Index(4)} ( 16.0464 Å, 6.38937 Å, 15.2588 Å )
AtomName('HB1') AtomIdx(11) {CGIdx(1),Index(5)} ( 15.63 Å, 6.98009 Å, 16.0747 Å )
AtomName('HB2') AtomIdx(12) {CGIdx(1),Index(6)} ( 16.8965 Å, 6.89072 Å, 14.7962 Å )
AtomName('HB3') AtomIdx(13) {CGIdx(1),Index(7)} ( 15.2415 Å, 6.18045 Å, 14.5541 Å )
AtomName('C') AtomIdx(14) {CGIdx(1),Index(8)} ( 15.3698 Å, 4.19397 Å, 16.434 Å )
AtomName('O') AtomIdx(15) {CGIdx(1),Index(9)} ( 14.9427 Å, 3.16814 Å, 15.8751 Å )
AtomName('N') AtomIdx(16) {CGIdx(2),Index(0)} ( 14.9657 Å, 4.59088 Å, 17.5824 Å )
AtomName('H') AtomIdx(17) {CGIdx(2),Index(1)} ( 15.3407 Å, 5.44815 Å, 17.9626 Å )
AtomName('CH3') AtomIdx(18) {CGIdx(2),Index(2)} ( 13.8341 Å, 3.93668 Å, 18.3509 Å )
AtomName('HH31') AtomIdx(19) {CGIdx(2),Index(3)} ( 14.3525 Å, 3.40994 Å, 19.1521 Å )
AtomName('HH32') AtomIdx(20) {CGIdx(2),Index(4)} ( 13.1933 Å, 4.59022 Å, 18.9428 Å )
AtomName('HH33') AtomIdx(21) {CGIdx(2),Index(5)} ( 13.2149 Å, 3.33301 Å, 17.6874 Å )

You can see the atom "HA" has correctly been moved to AtomIdx 0, and is printed first. However, it still keeps its original CGAtomIdx of 1,3. This means that its atom properties are in the fourth element of the second cutgroup.

>>> coords = mol.property("coordinates")
>>> coords.get(sr.legacy.Mol.CGIdx(1))[3]
( 17.2875 Å, 5.15172 Å, 16.5926 Å )

When you access an atom property via the atom interface, it does all the mapping from AtomIdx to CGAtomIdx behind the scenes, and returns the right property value. However, if you access the property by molecule, then you see the values arranged by CGAtomIdx, i.e. the property values are NOT in AtomIdx order, they are in CGAtomIdx order.

The problem is that some parts of the code are getting confused between AtomIdx order and CGAtomIdx order... It will require some time and thought to fix. I suspect we may need a check that sees if the CGAtomIdx order matches the AtomIdx order, and if not, a flag that reorders the data when it is copied across.

@chryswoods
Copy link
Contributor

One possible fix is to do the reordering when the molecule is committed after reindexing. That would remove the need for code to test things in lots of places. But, CutGroups really encode the concept of residues, so reindexing in a way that fragments residues (makes them discontinuous) would be messy. Possibly, in this case, we could split the residue into separate CutGroups and issue a warning that this has changed. This would only impact energies calculate with residue-based cutoffs, which aren't that common any more (MD is all atom-based). So it wouldn't be a terrible fix.

@chryswoods
Copy link
Contributor

I've made some progress. I've added code so that the cutgroups are reordered on mol.commit() if the CGAtomIdx order does not match the AtomIdx order. This works for everything except merging in proteins, where it breaks... This requires some thought - possibly easiest is to reorder the AtomIdx order of merged proteins...

@chryswoods chryswoods mentioned this issue Jun 22, 2024
@chryswoods chryswoods linked a pull request Jun 22, 2024 that will close this issue
@chryswoods chryswoods added this to the 2024.2.0 milestone Jun 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exscientia Related to work with Exscientia
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants