Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add explicit_hydrogen parameter #741

Open
wants to merge 2 commits into
base: interfaces
Choose a base branch
from

Conversation

padix-key
Copy link
Member

This parameter in interface.rdkit.to_mol() defines whether hydrogen should be explicitly or implicitly included in the created Mol

Copy link

codspeed-hq bot commented Jan 24, 2025

CodSpeed Performance Report

Merging #741 will not alter performance

Comparing padix-key:rdkit (167c31a) with main (2624c37)

Summary

✅ 59 untouched benchmarks

mol = EditableMol(Mol())

has_charge_annot = "charge" in atoms.get_annotation_categories()
for i in range(atoms.array_length()):
rdkit_atom = Atom(atoms.element[i].capitalize())
if has_charge_annot:
rdkit_atom.SetFormalCharge(atoms.charge[i].item())
if explicit_hydrogen:
rdkit_atom.SetNoImplicit(True)
Copy link
Contributor

@Croydon-Brixton Croydon-Brixton Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the base case where explicity_hydrogen=True, what would this mean if I just call to_mol for a typical crystal structure that won't have any hydrogens resolved? Will RDKit infer charges / valences in that case? If so this might lead to broken molecules -- if that were the case, should we check that if explicit hydrogen is true there have to be hydrogens in the structure? (I could see this being sth users will try -- at least I would have tried it :D )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Charges would also not be inferred automatically before this PR. But I am not sure what this means for valence: As the bond types are also explicitly set I guess RDKit assumes a radical, if explicit_hydrogen=True, but hydrogen atoms are actually missing. Probably, I should test this.

Having a check there sounds like a good idea, especially as I would agree that this could be a common mistake. However, I also think strictly checking for the simple presence of hydrogen atoms might not be sensible enough, as there are valid molecules without hydrogen atoms, although they appear rarely. What do you think about raising a warning as a 'reminder' to the user to check the input?

@padix-key
Copy link
Member Author

padix-key commented Feb 2, 2025

I checked the different behaviors when an AtomArray without hydrogen is passed to as_mol() with explicit_hydrogen being False/True:

import biotite.interface.rdkit as rdkit_interface
import biotite.structure.info as info
import rdkit.Chem.AllChem as Chem
import rdkit.Chem.Draw as Draw

atoms = info.residue("C")
atoms = atoms[atoms.element != "H"]

mol = rdkit_interface.to_mol(atoms, explicit_hydrogen=True)
Chem.Compute2DCoords(mol)
Draw.MolToFile(mol, "explicit.png")

mol = rdkit_interface.to_mol(atoms, explicit_hydrogen=False)
Chem.Compute2DCoords(mol)
Draw.MolToFile(mol, "non_explicit.png")

explicit.png:

explicit

non_explicit.png:

non_explicit

So rdkit correctly understands that in the latter case it should additional hydrogen atoms to the visualization, as the Mol already contains the hydrogen implicitly.

@padix-key
Copy link
Member Author

I added a warning in case the structure contains no hydrogen atoms. @Croydon-Brixton Could you have a look again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants