From 281f6c8eb192c8d350f18a49a3246e1bdabf5be5 Mon Sep 17 00:00:00 2001 From: Greg Landrum Date: Tue, 10 Dec 2024 04:40:55 +0100 Subject: [PATCH] document H atoms in SMARTS (#8081) * document H atoms in SMARTS * response to review --- Docs/Book/RDKit_Book.rst | 45 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 42 insertions(+), 3 deletions(-) diff --git a/Docs/Book/RDKit_Book.rst b/Docs/Book/RDKit_Book.rst index d880c091f9e..91c6efd5ed0 100644 --- a/Docs/Book/RDKit_Book.rst +++ b/Docs/Book/RDKit_Book.rst @@ -438,7 +438,8 @@ SMARTS Reference escape special characters. This is a wart from the documentation system we are using. Please ignore those characters. -**Atoms** +Atoms +^^^^^ ========= ========================================== =============== ====== ========= Primitive Property "Default value" Range? Notes @@ -470,8 +471,8 @@ Z "number of aliphatic heteroatom neighbors" >0 Y ========= ========================================== =============== ====== ========= - -**Bonds** +Bonds +^^^^^ ========= ==================== =================== Primitive Property Notes @@ -489,6 +490,44 @@ Primitive Property Notes <- "dative left" extension ========= ==================== =================== +Hs in SMARTS +^^^^^^^^^^^^ + +Hs in SMARTS are interpreted as hydrogen atoms if the equivalent atom expression would also be a valid SMILES; otherwise they are interpreted as a query for any atom with a single attached hydrogen. + +Some examples: + +====== ============== +SMARTS Interpretation +====== ============== +[H] [#1] +[H+] [#1+] +[H,Cl] [\*H1,Cl] +[HH] [\*H1;\*H1] +====== ============== + +This is somewhat confusing, but is consistent with the Daylight documentation (https://www.daylight.com/dayhtml/doc/theory/theory.smirks.html): + + Hence, a single change to SMARTS interpretation, for expressions of the form: + [H]. In SMARTS, these expressions now are interpreted as + a hydrogen atom, rather than as any atom with one hydrogen attached. All other + SMARTS hydrogen expressions retain their pre-4.51 meanings. + +It's always possible to see the RDKit's interpretation of a SMARTS using the ``DescribeQuery()`` function:: + + >>> print(Chem.AtomFromSmarts('[H,Cl]').DescribeQuery()) + AtomOr + AtomHCount 1 = val + AtomType 17 = val + + >>> print(Chem.AtomFromSmarts('[2H+]').DescribeQuery()) + AtomAnd + AtomAnd + AtomAtomicNum 1 = val + AtomIsotope 2 = val + AtomFormalCharge 1 = val + +The safest (and clearest) way to incorporate H atoms into your queries is to use the atomic number primitive `[#1]` instead of `[H]`. Mol/SDF Support and Extensions ==============================