Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature bond breaking #111

Merged
merged 45 commits into from
Oct 10, 2023
Merged

Feature bond breaking #111

merged 45 commits into from
Oct 10, 2023

Conversation

chryswoods
Copy link
Contributor

Changes proposed in this pull request:

This PR completes the work undertaken to add alchemical free energy simulation support to sire. The new functionality is described in the changelog, but mostly it is fully described in the new Part 6 of the tutorial. Tests for new functionality have been added. This includes an end-point energy test for the pentane to cyclopentane perturbation.

  • I confirm that I have merged the latest version of devel into this branch before issuing this pull request (e.g. by running git pull origin devel): [y]
  • I confirm that I have added a test for any new functionality in this pull request: [y]
  • I confirm that I have added documentation (e.g. a new tutorial page or detailed guide) for any new functionality in this pull request: [y]
  • I confirm that I have added a changelog entry to the changelog (we will add a link to this PR as part of the review): [y]
  • I confirm that I have permission to release this code under the GPL3 license: [y]

Suggested reviewers:

@lohedges

Any additional context of information?

This includes the recent PRs merged into devel, so I expect that this will also fail CI on Windows. I will debug locally and will push a fix.

I will also create a new branch from this as a 2023.4.0 prep, just making documentation fixes to the website (e.g. updated features, Quickstart guide and typo corrections). I am not going to make any functional changes now before 2023.4.0.

chryswoods and others added 30 commits September 19, 2023 17:55
…d mean that we

can handle bond breaking and forming, as we can now morph the intramolecular scale
factors that will change as the connectivity changes.
…nto a more user-friendly

PerturbableOpenMMMolecule class. This holds the data needed for the LambdaLever in a
more optimised format, which should save a little time. It also makes the way the
data is arranged and manipulated easier to understand, which should make it easier
for future developers to work with this code.
… that's needed now

is just some tests that we are doing bond breaking / creating properly :-)
…ble with alchemlyb

Also realised that the way I perturb torsion periodicities is wrong
…e don't

morph periodicity values

Added labels to the EnergyTrajectory so that we can store additional data, e.g.
the simulated lambda value. Also added the ability to save ensemble to the
System

This allows the `.energy_trajectory()` function to have a `to_alchemlyb`
option that outputs the energy trajectories as DataFrames in
an alchemlyb-compatible format.
… also

then the code to convert to alchemlyb format so that the columns and headers
have the right types

Also beginning the next part of the tutorial
…o run a simple free energy

simulation, plus how to extract the results in a format to be passed to alchemlyb
…uency is set to zero.

This prevents energies / frames from being saved from minimisation if this has been disabled.

Caught the case when a system doesn't have a space, so automatically add a Cartesian space

Working on the free energy tutorial. Now run the simulation for a bit longer and
save to a s3 file instead of parquet. Added in the alchemlyb code to do the analysis,
although this gives the wrong result (Bennetts from sire gives the right result for this
data, so I think I have done something wrong with alchemlyb)
…gies.

I've got results now that use alchemlyb and agree with other results.
…ent constraint method

than non-perturbable molecules. If this isn't set, then the constraint method will be
the same as for the non-perturbable molecules
…unit test)

I've also fixed some of the issues with constraints, i.e. I had originally removed the
internal energy terms for constrained atoms, which led to weird free energy behaviour.

I've added those energies back in, and modified the OpenMM tests to check reproducibility
of the energy when using a large timestep.

I have removed minimisation from the dynamics interface, because it is too difficult
to update constraints in an existing context. Instead, I have updated the tutorial to
use an unconstrained minimisation for each lambda window.

I am still debugging, but am making progress.
…ssure, and also to

trigger randomisation of the velocities.

Fixed a bug in OpenMMMolecule where it set the coordinates to 0,0,0 if the property
didn't exist in the molecule... Now is raises a missing_property exception as it should ;-)

Updated the tutorial to randomise velocities, and to use the correct free energies calculated
using the code as it is now. Randomising velocities has made the free energy slightly
different.

Also added a new section on how to make the simulation faster, where I show using the timestep
and constraints. Saw that the simulation is quite stable and that h-bond-h-atoms is the best
way to go, so have added this as an intermediate step.

Also shown how to use the hydrogen mass repartitioning function
… of constrained

degrees of freedom is included in the OpenMM context.

Exposed the "platform" option to the `.dynamics()` and `.minimisation()` functions.
Also removed their auto-parsing of kwargs, as this was a big source of typo
bugs when calling the function...

Fixed a bug in how the file formats were read in MoleculeParser where they came
only from the property map value, but should be checked for both the value
and the source. Updated the way create_map works so that it actually returns
when it has correctly added the value to the map, rather than keeping going
and always calling the expensive wrap function (hence why strings were
sometimes coming in as values rather than sources)

Exposed and tested all of this via a new test in test_openmm.py
…, or else it is ignored...

Also need to remove the code that detected hydrogens by element, as otherwise HMR won't
prevent those hydrogens from being constrained... Now, we only constrain "light" atoms
when we use `hbonds`. We probably should use a better name, but `lightatoms` is a bit
long and nebulous
… EnergyTrajectory.

This means that we don't need to think about temperature when doing an alchemlyb
analysis - it is all automatic.

Added a `sire.morph.to_alchemlyb` function that automatically takes a list of
EnergyTrajectory objects, or s3 filenames, or something that can be globbed,
and returns the single alchemlyb-formatted dataframe ready for analysis.

Now running some final tests to make sure that the free energies look ok
and the simulation is fast enough
…om 4 to 1.5,

because 4 causes a LangevinMiddleIntegrator to become unstable with a 4fs
timestep, but 1.5 seems fine. I've gone with the best default for the default
integrator.

I've added a `vacuum` option to the dynamics and minimisation functions to make
it easier to specify that a simulation should be run in vacuum. This makes
the writing of vacuum legs easier, as now you don't need to manually set
the space.

I've finished off the free energy tutorial, adding an example script that
does the whole job of calculating a relative hydration free energy.
…nd of the tutorial.

The free energy looks good :-)
…lation run.

Added the convergence plot.

Fixed some of the circular import errors by making sure that sire.base always
gets imported completely by itself first.

Fixed the broken test_hmr now that the default mass factor has changed.
…ready have an error :-)

This needs fixing before the PR
@chryswoods
Copy link
Contributor Author

I think I have fixed the compile error on Windows...

@chryswoods chryswoods added this to the 2023.4.0 milestone Oct 7, 2023
… features and

quickstart guide, and updated the installation instructions for python 3.9-3.11,
and to explicitly add in the conda-forge channel. I've also warned that the
docker container route is not updated often...

[ci skip]
@chryswoods
Copy link
Contributor Author

That fixed that error on Windows, but now there is a strange multiply-defined symbol link error for the boost parser in SireUnits... (I can't reproduce this locally yet, as my local windows build completes without error).

Thanks to this StackOverflow post for pointing me in the right direction

https://stackoverflow.com/questions/75621251/how-to-get-rid-of-error-lnk2005-when-linking-2-cpp-files-both-including-boost

Will skip CI for now as want to test locally on other OS's to make sure this hasn't broken them

[ci skip]
…oost has updated to

a new version that is incompatible with the C++ compiler used to build conda on windows...

I have had to disable findMCS on Windows because boost/graph/mcgregor_common_subgraphs.hpp
won't compile because it pulls in boost/describe.hpp which fails its compile with the error
message

C:\Users\chris\miniforge3\envs\openbiosim\Library\include\boost/describe/enumerators.hpp(22): error C2065: 'boost_enum_descriptor_fn': undeclared iden
tifier [C:\Users\chris\Coding\sire\build\openbiosim_openbiosim_corelib\src\libs\SireMol\SireMol.vcxproj]

(amongst others)

I'll try the CI again...
@chryswoods chryswoods temporarily deployed to sire-build October 7, 2023 22:00 — with GitHub Actions Inactive
@chryswoods chryswoods temporarily deployed to sire-build October 7, 2023 22:00 — with GitHub Actions Inactive
@chryswoods chryswoods temporarily deployed to sire-build October 7, 2023 22:00 — with GitHub Actions Inactive
@chryswoods chryswoods temporarily deployed to sire-build October 7, 2023 22:00 — with GitHub Actions Inactive
@chryswoods
Copy link
Contributor Author

I can reproduce the Windows compile errors locally (thankfully). It is a Python 3.11, using a new version of boost I think? There's lots of small boost compile errors that look like they may be incompatibilities between boost and the version of MSVC used to compile the conda package. I'll see if the fixes I've made will let it compile overnight...

@chryswoods
Copy link
Contributor Author

Ok - I've fixed the Windows compile issue. There was one boost header that I couldn't fix (mcgregor_common_subgraphs). This is only used in findMCS.cpp. I've thus disable this functionality on Windows (sire's MCS matching). I don't think it is used much, and people prefer RDKit. It is a shame to lose the fallback, but fixing this would, I think, require upgrading to a new MSVC compiler version for the Windows build. This is too much work for this release. I may consider it for the point release.

The build failure is just a couple of unit tests failing on Windows that require the findMCS functionality. I've conditionally disabled these tests. I haven't re-run the CI to save GH Actions, and because I have tested locally that everything is ok.

I may look into automatically detecting the failure of mcgregor_common_subgraphs in cmake, as it may be that this works with Python 3.9 or 3.10 (my Windows build used to be 3.10, which is why I couldn't reproduce it locally - but that could also have been that my build didn't upgrade boost).

I don't want to pin to an older version of boost on Windows as this will be a compatibility issue with other packages (boost is a very common dependency).

If the functionality is really needed on Windows, then another option may be to copy in the older mcgregor_common_subgraphs header file and patch it to work with new boost, without pulling in boost/describe.hpp. But, that may be quite a rabbit hole... ;-)

I've run sire_bigtests and they all pass on all OSs. I'm also now looking at the BioSimSpace tests from devel and the string units PR. There's a few things that fail because of new_system._sire_object.property("space").isPeriodic() fails because the space is a Cartesian(), which I will look into tomorrow. I think we are nearly there :-)

@lohedges
Copy link
Contributor

lohedges commented Oct 8, 2023

The Sire MCS fallback is only really triggered when the user passes in a prematch, since RDKit has no easy way to build the MCS and checking against it at each step, i.e. you only check that the MCS contains the pre-match as a post-processing step. We can just disable this on Windows for now and re-enable once a fix is found.

The Cartesian space issue sounds strange. I recently needed to update the way that I copy new boxes from trajectory and restart files into the reference system, since I don't want to override a valid space if there isn't one in the file, where Sire would apply a default Cartesian space for you. The logic seemed to work and tested passed during the PR. I guess something else must have changed.

Glad we're getting there!

@chryswoods
Copy link
Contributor Author

Sorry - a false alarm on the BioSimSpace tests. I'd mistakenly not uninstalled an older version of BioSimSpace before installing feature_string_protocol. I can confirm that, with feature_string_protocol installed, all of the tests that I can run pass (I haven't got ambertools installed so can't do anything with OpenFF). The logic for spaces all works, as does the new code that lets you pass in strings for units :-)

…nto their own

section so that they appear in the menu.

Fixed the tutorial that hadn't been updated with the new default value of
to_pandas=False

Re-ran the simulations in 05_free_energy_perturbation to make sure that they
were consistent and worked with the new options. The results are in great
agreement with those on the next page.

[ci skip]
Copy link
Contributor

@lohedges lohedges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested locally and can confirm that all BioSimSpace tests pass. Thanks for this Herculean effort!

@chryswoods chryswoods merged commit 9e4edb8 into devel Oct 10, 2023
@chryswoods chryswoods deleted the feature_bond_breaking branch October 10, 2023 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] system.ensemble() returns the wrong temperature units
2 participants