Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Issue merging small molecules #116

Closed
noahharrison64 opened this issue Oct 19, 2023 · 14 comments · Fixed by #117
Closed

[BUG] Issue merging small molecules #116

noahharrison64 opened this issue Oct 19, 2023 · 14 comments · Fixed by #117
Assignees
Labels
bug Something isn't working

Comments

@noahharrison64
Copy link

Hi,

I've been encountering issues when trying to use the Align.merge() functionality.

Code:

paramd = []
for filebase in ['ejm31', 'ejm42']:
    mol = BSS.IO.readMolecules([f"{filebase}.sdf"])
    param_mol = BSS.Parameters.gaff2(mol.getMolecule(0)).getMolecule()
    paramd.append(param_mol)
    
ejm31, ejm42 = paramd
merged = BSS.Align.merge(
    ejm31,
    ejm42,
    force = True
)

Error:
Large stack trace so added as text file

Input files
I am using the tyk2 data files provided in the old BioSimSpaceTutorials repo.
ejm31.sdf, ejm42.sdf

(please complete the following information):

  • OS: Linux x84_64
  • Version of Python: 3.9.1
  • Version of BioSimSpace: 2023.4.0
  • I confirm that I have checked this bug still exists in the latest released version of BioSimSpace: yes

Many thanks,
Noah

@noahharrison64 noahharrison64 added the bug Something isn't working label Oct 19, 2023
@noahharrison64 noahharrison64 changed the title [BUG] Issue working with merging small molecules [BUG] Issue merging small molecules Oct 19, 2023
@chryswoods chryswoods transferred this issue from OpenBioSim/biosimspace Oct 19, 2023
@chryswoods
Copy link
Contributor

Thanks for reporting. I think from the stack trace that something is going wrong with the transfer of the RDKit shared pointer from C++ up to Python. This is in the sire layer, so I've transferred the issue across to this repo so I can debug and track this here.

The wrapping of the RDKit shared pointer is, I think, handled by RDKit. Could you let me know what version of RDKit you have installed?

Do this by running;

>>> import rdkit
>>> print(rdkit.__version__)

I get 2023.03.3 on my computer, and your test script runs without issue. My guess is that you have a different version?

We're going to release 2023.4.1 soon, so I will get a fix in for that that will solve the problem :-)

@lohedges
Copy link
Contributor

Just to say that I've also tested locally on Linux (Python 3.11) and it's working for me too. I have:

In [1]: import rdkit

In [2]: rdkit.__version__
Out[2]: '2023.09.1'

@chryswoods chryswoods self-assigned this Oct 19, 2023
@chryswoods
Copy link
Contributor

I've pushed a commit to the fix_116 branch that I've associated with this issue. The commit adds in an explicit wrapping of the boost::shared_ptr<RDKit::RO_MOL> class, which I hope will fix the problem.

I'll know once I can reproduce the issue locally.

@chryswoods
Copy link
Contributor

I think this will fix it. Looking at the RDKit source code, they mix boost:shared_ptr<ROMol> with std::shared_ptr<ROMol>.

For example, here they define ROMOL_SPTR to be a boost::shared_ptr<ROMol>...

while in the Python wrapping code here they expose the Python wrapper using boost::python::register_ptr_to_python<std::shared_ptr<ROMol>>...

That code has been there for a long time - git blame shows;

bfe63718ac (Greg Landrum      2021-01-20 06:11:28 +0100 819)      python::register_ptr_to_python<std::shared_ptr<ROMol>>();

So I am not sure how this has even been working?

Note that they use boost::python::register_ptr_to_python<boost::shared_ptr<>> for most of the other data types they wrap.

All I can guess is that somewhere else, they were exposing the boost::python::register_ptr_to_python<boost::shared_ptr<ROMol>>, and that somehow this has now been removed.

My fix here won't break anything, as wrapping this twice is not a problem (beside a boost::python warning about a duplicate wrap). Including the wrap here will make sure that the code works for versions of RDKit that do provide the wrap, and versions that don't (since we use RDKit::ROMOL_SPTR consistently, and wrap with boost::python::register_ptr_to_python<RDKit::ROMOL_SPTR>).

@noahharrison64
Copy link
Author

noahharrison64 commented Oct 20, 2023

Hi @chryswoods, @lohedges

Thanks for both your replies.
I'm working on a fresh BSS install having followed the install guide on the docs page. My rdkit version is the same as @chryswoods

import rdkit
print(rdkit.__version__)
2023.03.3

Were either of you able to reproduce this issue eventually? I've just tried again using the code block above and the input files and I'm definitely getting the same error.

I tried to install the sire branch fix_116 in a new env:

git clone https://github.com/OpenBioSim/sire
cd sire
python setup.py --install-bss-deps install

But when installing BioSimSpace afterwards with mamba install -c openbiosim biosimspace mamba appears to replace my sire version with the openbiosim channel version:

───────────────────────────────────────────────────────────────────────────
  Install:
───────────────────────────────────────────────────────────────────────────

  + biosimspace  2023.4.0  py39_0          openbiosim/linux-64     Cached
  + sire         2023.4.0  py39h3fd9d12_0  openbiosim/linux-64     Cached

  Summary:

  Install: 2 packages
...

And so I've been unable to test the fix. How do you suggest installing BSS with a custom compiled sire version? Do I also need to build BSS from source rather than using a package manager?

@lohedges
Copy link
Contributor

HI @noahharrison64. After the Sire install you can install BioSimSpace into the environment and skip the installation of dependencies as follows:

BSS_SKIP_DEPENDENCIES=1 python setup.py install

@chryswoods
Copy link
Contributor

Could you try running only using sire? e.g.

import sire as sr

mols = sr.load("ejm31.sdf")

rdmol = sr.convert.to(mols, "rdkit")

print(rdmol)

When I run this I get

<rdkit.Chem.rdchem.Mol object at 0x159a6c900>

which shows that it is working.

@noahharrison64
Copy link
Author

noahharrison64 commented Oct 20, 2023

In the new python env with sire installed from source and on the 'fix_116' branch (version 2023.5.0.dev)

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:1                                                                                    │
│                                                                                                  │
│ /root/miniforge3/envs/biosimspace_v23_sire_update/lib/python3.9/site-packages/sire/convert/__ini │
│ t__.py:66 in to                                                                                  │
│                                                                                                  │
│    63if format == "sire":                                                                   │
│    64 │   │   return to_sire(obj, map=map)                                                       │
│    65elif format == "rdkit":                                                                │
│ ❱  66 │   │   return to_rdkit(obj, map=map)                                                      │
│    67elif format == "biosimspace":                                                          │
│    68 │   │   return to_biosimspace(obj, map=map)                                                │
│    69elif format == "openmm":                                                               │
│                                                                                                  │
│ /root/miniforge3/envs/biosimspace_v23_sire_update/lib/python3.9/site-packages/sire/convert/__ini │
│ t__.py:181 in to_rdkit                                                                           │
│                                                                                                  │
│   178Convert the passed object from its current object format to a                          │
│   179rdkit object format.                                                                   │
│   180 │   """                                                                                    │
│ ❱ 181return sire_to_rdkit(to_sire(obj, map=map), map=map)                                   │
│   182                                                                                            │
│   183                                                                                            │
│   184 def to_openmm(obj, map=None):                                                              │
│                                                                                                  │
│ /root/miniforge3/envs/biosimspace_v23_sire_update/lib/python3.9/site-packages/sire/convert/__ini │
│ t__.py:427 in sire_to_rdkit                                                                      │
│                                                                                                  │
│   424 │                                                                                          │
│   425from ..base import create_map                                                          │
│   426 │                                                                                          │
│ ❱ 427mols = _sire_to_rdkit(obj, map=create_map(map))                                        │
│   428 │                                                                                          │
│   429if mols is None:                                                                       │
│   430 │   │   return None                                                                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: No Python class registered for C++ class RDKit::ROMol

On the original env I was using (2023.4.0)

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /mnt/notebooks/development/mdynamics/pyscript.py:5 in <module>                                   │
│                                                                                                  │
│   2                                                                                              │
│   3 mols = sr.load("ejm31.sdf")                                                                  │
│   4                                                                                              │
│ ❱ 5 rdmol = sr.convert.to(mols, "rdkit")                                                         │
│   6                                                                                              │
│   7 print(rdmol)                                                                                 │
│   8                                                                                              │
│                                                                                                  │
│ /root/miniforge3/envs/biosimspace_v23_4/lib/python3.9/site-packages/sire/convert/__init__.py:66  │
│ in to                                                                                            │
│                                                                                                  │
│    63if format == "sire":                                                                   │
│    64 │   │   return to_sire(obj, map=map)                                                       │
│    65elif format == "rdkit":                                                                │
│ ❱  66 │   │   return to_rdkit(obj, map=map)                                                      │
│    67elif format == "biosimspace":                                                          │
│    68 │   │   return to_biosimspace(obj, map=map)                                                │
│    69elif format == "openmm":                                                               │
│                                                                                                  │
│ /root/miniforge3/envs/biosimspace_v23_4/lib/python3.9/site-packages/sire/convert/__init__.py:181 │
│ in to_rdkit                                                                                      │
│                                                                                                  │
│   178Convert the passed object from its current object format to a                          │
│   179rdkit object format.                                                                   │
│   180 │   """                                                                                    │
│ ❱ 181return sire_to_rdkit(to_sire(obj, map=map), map=map)                                   │
│   182                                                                                            │
│   183                                                                                            │
│   184 def to_openmm(obj, map=None):                                                              │
│                                                                                                  │
│ /root/miniforge3/envs/biosimspace_v23_4/lib/python3.9/site-packages/sire/convert/__init__.py:427 │
│ in sire_to_rdkit                                                                                 │
│                                                                                                  │
│   424 │                                                                                          │
│   425from ..base import create_map                                                          │
│   426 │                                                                                          │
│ ❱ 427mols = _sire_to_rdkit(obj, map=create_map(map))                                        │
│   428 │                                                                                          │
│   429if mols is None:                                                                       │
│   430 │   │   return None                                                                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: No to_python (by-value) converter found for C++ type: boost::shared_ptr<RDKit::ROMol>

When I try and run the code block above (i.e. merging the mols) I get the following error, which I assume is because I followed @lohedges advice on skipping BSS install dependencies:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:1                                                                                    │
│                                                                                                  │
│ ❱  1 import BioSimSpace as BSS                                                                   │
│    2 paramd = []                                                                                 │
│    3 for filebase in ['ejm31', 'ejm42']:                                                         │
│    4mol = BSS.IO.readMolecules([f"input_folder/{filebase}.sdf"])                            │
│                                                                                                  │
│ /root/miniforge3/envs/biosimspace_v23_sire_update/lib/python3.9/site-packages/BioSimSpace-2023.5 │
│ .0.dev0-py3.9.egg/BioSimSpace/__init__.py:240 in <module>                                        │
│                                                                                                  │
│   237 from . import Align                                                                        │
│   238 from . import Box                                                                          │
│   239 from . import Convert                                                                      │
│ ❱ 240 from . import FreeEnergy                                                                   │
│   241 from . import Gateway                                                                      │
│   242 from . import IO                                                                           │
│   243 from . import Metadynamics                                                                 │
│                                                                                                  │
│ /root/miniforge3/envs/biosimspace_v23_sire_update/lib/python3.9/site-packages/BioSimSpace-2023.5 │
│ .0.dev0-py3.9.egg/BioSimSpace/FreeEnergy/__init__.py:43 in <module>                              │
│                                                                                                  │
│   40getData                                                                                 │
│   41 """                                                                                         │
│   42                                                                                             │
│ ❱ 43 from ._relative import *                                                                    │
│   44 from ._utils import *                                                                       │
│   45                                                                                             │
│                                                                                                  │
│ /root/miniforge3/envs/biosimspace_v23_sire_update/lib/python3.9/site-packages/BioSimSpace-2023.5 │
│ .0.dev0-py3.9.egg/BioSimSpace/FreeEnergy/_relative.py:111 in <module>                            │
│                                                                                                  │
│    108 │   │   _os.path.normpath(_getShareDir()), "scripts", "analyse_freenrg.py"                │
│    109 │   )                                                                                     │
│    110 if not _os.path.isfile(_analyse_freenrg):                                                 │
│ ❱  111 │   raise _MissingSoftwareError(                                                          │
│    112 │   │   "Cannot find free energy analysis script in expected location: '%s'"              │
│    113 │   │   % _analyse_freenrg                                                                │
│    114 │   )                                                                                     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
MissingSoftwareError: Cannot find free energy analysis script in expected location: 
'/root/miniforge3/envs/biosimspace_v23_sire_update/bin/analyse_freenrg'

@lohedges
Copy link
Contributor

When I try and run the code block above (i.e. merging the mols) I get the following error, which I assume is because I followed @lohedges advice on skipping BSS install dependencies:

That doesn't make sense, since you've already installed Sire, which provides the script that it says is missing.

@noahharrison64
Copy link
Author

Alright I'll try another fresh install , things may have got mixed up after I removed BSS/Sire following the mamba install I described above

But when installing BioSimSpace afterwards with mamba install -c openbiosim biosimspace mamba appears to replace rather than using a package manager?

@chryswoods
Copy link
Contributor

Something looks like it has gone wrong with your RDKit installation, as RDKit::ROMol should definitely be wrapped by RDKit and exposed to Python.

Could you try running this script. It imports the RDKit::ROMol object directly from RDKit and prints the class type. Then it does the same for the RDKit object returned by sire.

import sire as sr

from rdkit.Chem.AllChem import Mol

print(Mol)

mols = sr.load("ejm31.sdf")

rdmol = sr.convert.to(mols, "rdkit")

print(rdmol)

I get output

<class 'rdkit.Chem.rdchem.Mol'>
<rdkit.Chem.rdchem.Mol object at 0x159836d50>

If this still doesn't work, then I can only think that your Python is loading a different RDKit library at runtime from the one installed by mamba. Check that you don't have a PYTHONPATH environment variable set.

@noahharrison64
Copy link
Author

I'm still getting the same error on conversion. Checking sys.path I just have the following:

['/mnt/notebooks/development/mdynamics', 
 '/mnt/notebooks/development/mdynamics', 
 '/root/private_python_repo',
 '/root/miniforge3/envs/biosimspace_v23_sire_update/lib/python39.zip',
 '/root/miniforge3/envs/biosimspace_v23_sire_update/lib/python3.9',
 '/root/miniforge3/envs/biosimspace_v23_sire_update/lib/python3.9/lib-dynload',
 '',
 '/root/miniforge3/envs/biosimspace_v23_sire_update/lib/python3.9/site-packages',
 '/root/miniforge3/envs/biosimspace_v23_sire_update/lib/python3.9/site-packages/BioSimSpace-2023.5.0.dev0-py3.9.egg']

Checking the AllChem module file:

from rdkit.Chem import AllChem
AllChem.__file__
'/root/miniforge3/envs/biosimspace_v23_sire_update/lib/python3.9/site-packages/rdkit/Chem/AllChem.py'

Currently re-building sire from source in a fresh env. Will update once that's finished.

@noahharrison64
Copy link
Author

noahharrison64 commented Oct 20, 2023

Okay I've finally got to the bottom of this. It looks like the issue didn't require the fix you pushed @chryswoods. Instead there appears to have been an rdkit version mismatch that snuck its way in during my "fresh install". This install consisted of mamba installing biosimspace and dependencies, followed by my github repo, which required an install of chembl_structure_pipeline (CSP). I was pip installing CSP, not noticing that it in the process pip installed rdkit=2023.3.3.

I think the version of rdkit (2023.03.3) I imported into my IPython kernel was from the conda channel, while it appears Sire was trying to communicate with the pip installed version?
I circumvented this by installing BSS and dependencies, followed my mamba installing CSP, which in turn didn't do any rdkit installs. BSS is now successfully communicating with RDKit and the molecule merging is behaving as expected!

Thanks for the help, apologies for not spotting this sneaky rdkit install earlier.
Noah

@chryswoods
Copy link
Contributor

No problem at all - your experience is not going to be that uncommon. You've helped us see the symptoms of this problem, which will help us debug the issue for others in the future.

I think I'll keep the fix as it doesn't seem to break anything, and I am surprised that RDKit works when it is trying to expose a std::shared_ptr wrapper of a boost::shared_ptr object - I am sure this will cause some weird bugs in the future.

@chryswoods chryswoods mentioned this issue Oct 20, 2023
@chryswoods chryswoods linked a pull request Oct 20, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants