Skip to content

Commit

Permalink
document H atoms in SMARTS (rdkit#8081)
Browse files Browse the repository at this point in the history
* document H atoms in SMARTS

* response to review
  • Loading branch information
greglandrum authored Dec 10, 2024
1 parent 4e320d9 commit 281f6c8
Showing 1 changed file with 42 additions and 3 deletions.
45 changes: 42 additions & 3 deletions Docs/Book/RDKit_Book.rst
Original file line number Diff line number Diff line change
Expand Up @@ -438,7 +438,8 @@ SMARTS Reference
escape special characters. This is a wart from the documentation system we are using.
Please ignore those characters.

**Atoms**
Atoms
^^^^^

========= ========================================== =============== ====== =========
Primitive Property "Default value" Range? Notes
Expand Down Expand Up @@ -470,8 +471,8 @@ Z "number of aliphatic heteroatom neighbors" >0 Y
========= ========================================== =============== ====== =========



**Bonds**
Bonds
^^^^^

========= ==================== ===================
Primitive Property Notes
Expand All @@ -489,6 +490,44 @@ Primitive Property Notes
<- "dative left" extension
========= ==================== ===================

Hs in SMARTS
^^^^^^^^^^^^

Hs in SMARTS are interpreted as hydrogen atoms if the equivalent atom expression would also be a valid SMILES; otherwise they are interpreted as a query for any atom with a single attached hydrogen.

Some examples:

====== ==============
SMARTS Interpretation
====== ==============
[H] [#1]
[H+] [#1+]
[H,Cl] [\*H1,Cl]
[HH] [\*H1;\*H1]
====== ==============

This is somewhat confusing, but is consistent with the Daylight documentation (https://www.daylight.com/dayhtml/doc/theory/theory.smirks.html):

Hence, a single change to SMARTS interpretation, for expressions of the form:
[<weight>H<charge><map>]. In SMARTS, these expressions now are interpreted as
a hydrogen atom, rather than as any atom with one hydrogen attached. All other
SMARTS hydrogen expressions retain their pre-4.51 meanings.

It's always possible to see the RDKit's interpretation of a SMARTS using the ``DescribeQuery()`` function::

>>> print(Chem.AtomFromSmarts('[H,Cl]').DescribeQuery())
AtomOr
AtomHCount 1 = val
AtomType 17 = val

>>> print(Chem.AtomFromSmarts('[2H+]').DescribeQuery())
AtomAnd
AtomAnd
AtomAtomicNum 1 = val
AtomIsotope 2 = val
AtomFormalCharge 1 = val

The safest (and clearest) way to incorporate H atoms into your queries is to use the atomic number primitive `[#1]` instead of `[H]`.

Mol/SDF Support and Extensions
==============================
Expand Down

0 comments on commit 281f6c8

Please sign in to comment.