Merge pull request #227 from forlilab/readthedocs

Readthedocs
forlilab · Nov 7, 2024 · ba31e10 · ba31e10
2 parents 1a2de91 + 1b4f6c0
commit ba31e10
Show file tree

Hide file tree

Showing 27 changed files with 1,753 additions and 76 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,4 @@
+# MacOS system files
 *.DS_Store
 
 # Byte-compiled / optimized / DLL files

diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -0,0 +1,35 @@
+# Read the Docs configuration file for Sphinx projects
+# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
+
+# Required
+version: 2
+
+# Set the OS, Python version and other tools you might need
+build:
+  os: ubuntu-22.04
+  tools:
+    python: "3.12"
+    # You can also specify other tool versions:
+    # nodejs: "20"
+    # rust: "1.70"
+    # golang: "1.20"
+
+# Build documentation in the "docs/" directory with Sphinx
+sphinx:
+  configuration: docs/source/conf.py
+  # You can configure Sphinx to use a different builder, for instance use the dirhtml builder for simpler URLs
+  # builder: "dirhtml"
+  # Fail on all warnings to avoid broken references
+  # fail_on_warning: true
+
+# Optionally build your docs in additional formats such as PDF and ePub
+# formats:
+#   - pdf
+#   - epub
+
+# Optional but recommended, declare the Python requirements required
+# to build your documentation
+# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
+python:
+  install:
+    - requirements: docs/requirements.txt
diff --git a/README.md b/README.md
@@ -2,6 +2,7 @@
 
 [![API stability](https://img.shields.io/badge/stable%20API-no-orange)](https://shields.io/)
 [![PyPI version fury.io](https://img.shields.io/badge/version-0.6.0-green.svg)](https://pypi.python.org/pypi/meeko/)
+[![Documentation Status](https://readthedocs.org/projects/meeko/badge/?version=readthedocs)](https://meeko.readthedocs.io/en/readthedocs/?badge=readthedocs)
 
 Meeko reads an RDKit molecule object and writes a PDBQT string (or file)
 for [AutoDock-Vina](https://github.com/ccsb-scripps/AutoDock-Vina)
@@ -85,85 +86,9 @@ conda install -c conda-forge numpy scipy rdkit
 pip install prody # optional. pip recommended at http://prody.csb.pitt.edu/downloads/
 ```
 
-## Installation (from PyPI)
-```bash
-$ pip install meeko
-```
-If using conda, `pip` installs the package in the active environment.
-
-## Installation (from source)
-You'll get the develop branch, which may be ahead of the latest release.
-```bash
-$ git clone https://github.com/forlilab/Meeko
-$ cd Meeko
-$ pip install .
-```
-
-Optionally include `--editable`. Changes in the original package location
-take effect immediately without the need to run `pip install .` again.
-```bash
-$ pip install --editable .
-```
-
-
-## Examples using the command line scripts
-
-#### 1. make PDBQT files
-AutoDock-GPU and Vina read molecules in the PDBQT format. These can be prepared
-by Meeko from SD files, or from Mol2 files, but SDF is strongly preferred.
-```console
-mk_prepare_ligand.py -i molecule.sdf -o molecule.pdbqt
-mk_prepare_ligand.py -i multi_mol.sdf --multimol_outdir folder_for_pdbqt_files
-```
-
-#### 2. convert docking results to SDF
-AutoDock-GPU and Vina write docking results in the PDBQT format. The DLG output
-from AutoDock-GPU contains docked poses in PDBQT blocks.
-Meeko generates RDKit molecules from PDBQT files (or strings) using the SMILES
-string in the REMARK lines. The REMARK lines also have the mapping of atom indices
-between SMILES and PDBQT. SD files with docked coordinates are written
-from RDKit molecules.
-
-```console
-mk_export.py molecule.pdbqt -o molecule.sdf
-mk_export.py vina_results.pdbqt -o vina_results.sdf
-mk_export.py autodock-gpu_results.dlg -o autodock-gpu_results.sdf
-```
-
-Making RDKit molecules from SMILES is safer than guessing bond orders
-from the coordinates, specially because the PDBQT lacks hydrogens bonded
-to carbon. As an example, consider the following conversion, in which
-OpenBabel adds an extra double bond, not because it has a bad algorithm,
-but because this is a nearly impossible task.
-```console
-$ obabel -:"C1C=CCO1" -o pdbqt --gen3d | obabel -i pdbqt -o smi
-[C]1=[C][C]=[C]O1
-```
 
 ## Python tutorial
 
-#### 1. making PDBQT strings for Vina or for AutoDock-GPU
-
-```python
-from meeko import MoleculePreparation
-from meeko import PDBQTWriterLegacy
-from rdkit import Chem
-
-input_molecule_file = "example/BACE_macrocycle/BACE_4.sdf"
-
-# there is one molecule in this SD file, this loop iterates just once
-for mol in Chem.SDMolSupplier(input_molecule_file, removeHs=False):
-    preparator = MoleculePreparation()
-    mol_setups = preparator.prepare(mol)
-    for setup in mol_setups:
-        setup.show() # optional
-        pdbqt_string = PDBQTWriterLegacy.write_string(setup)
-```
-At this point, `pdbqt_string` can be written to a file for
-docking with AutoDock-GPU or Vina, or passed directly to Vina within Python
-using `set_ligand_from_string(pdbqt_string)`. For context, see
-[the docs on using Vina from Python](https://autodock-vina.readthedocs.io/en/latest/docking_python.html).
-
 
 #### 2. RDKit molecule from docking results
 

diff --git a/docs/Makefile b/docs/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = source
+BUILDDIR      = build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/make.bat b/docs/make.bat
@@ -0,0 +1,35 @@
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+        set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=source
+set BUILDDIR=build
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+        echo.
+        echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+        echo.installed, then set the SPHINXBUILD environment variable to point
+        echo.to the full path of the 'sphinx-build' executable. Alternatively you
+        echo.may add the Sphinx directory to PATH.
+        echo.
+        echo.If you don't have Sphinx installed, grab it from
+        echo.https://www.sphinx-doc.org/
+        exit /b 1
+)
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+
+:end
+popd
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -0,0 +1 @@
+sphinx-book-theme
diff --git a/docs/source/cli_export_result.rst b/docs/source/cli_export_result.rst
@@ -0,0 +1,43 @@
+mk_export.py
+============
+
+Convert docking results to SDF
+------------------------------
+
+AutoDock-GPU and Vina write docking results in the PDBQT format. The DLG output
+from AutoDock-GPU contains docked poses in PDBQT blocks, plus additional information.
+Meeko generates RDKit molecules from PDBQT using the SMILES
+string in the REMARK lines. The REMARK lines also have the mapping of atom indices
+between SMILES and PDBQT. SD files with docked coordinates are written
+from RDKit molecules.
+
+.. code-block:: bash
+
+    mk_export.py molecule.pdbqt -o molecule.sdf
+    mk_export.py vina_results.pdbqt -o vina_results.sdf
+    mk_export.py autodock-gpu_results.dlg -o autodock-gpu_results.sdf
+
+Why this matters
+----------------
+
+Making RDKit molecules from SMILES is safer than guessing bond orders
+from the coordinates, specially because the PDBQT lacks hydrogens bonded
+to carbon. As an example, consider the following conversion, in which
+OpenBabel adds an extra double bond, not because it has a bad algorithm,
+but because this is a nearly impossible task.
+
+.. code-block:: bash
+
+    obabel -:"C1C=CCO1" -o pdbqt --gen3d | obabel -i pdbqt -o smi
+    [C]1=[C][C]=[C]O1
+
+
+Caveats
+-------
+
+If docking does not use explicit Hs, which it often does not, the
+exported positions of hydrogens are calculated from RDKit. This can
+be annoying if a careful forcefield minimization is employed before
+docking, as probably rigorous Hs positions will be replaced by the
+RDKit geometry rules, which are empirical and much simpler than most
+force fields.
diff --git a/docs/source/cli_lig_prep.rst b/docs/source/cli_lig_prep.rst
@@ -0,0 +1,16 @@
+mk_prepare_ligand.py
+====================
+
+Command line tool to prepare small organic molecules.
+
+Write PDBQT files
+-----------------
+
+AutoDock-GPU and Vina read molecules in the PDBQT format. These can be prepared
+by Meeko from SD files, or from Mol2 files, but SDF is strongly preferred.
+
+.. code-block:: bash
+
+    mk_prepare_ligand.py -i molecule.sdf -o molecule.pdbqt
+    mk_prepare_ligand.py -i multi_mol.sdf --multimol_outdir folder_for_pdbqt_files
+
diff --git a/docs/source/cli_rec_prep.rst b/docs/source/cli_rec_prep.rst
@@ -0,0 +1,104 @@
+The input structure is matched against templates to
+guarantee chemical correctness and identify problems with the input structures.
+This allows the user to identify and fix problems, resulting in a molecular
+model that is correct with respect to heavy atoms, protonation state,
+connectivity, bond orders, and formal charges.
+
+The matching algorithm uses the connectivity and elements, but not bond orders
+or atom names. Hydrogens are optional. This makes it compatible with input
+files from various sources.
+
+Templates are matched on a per residue basis. Each residue is represented
+as an instance of a PolymerResidue object, which contains:
+ - an RDKit molecule that represents the actual state
+ - a padded RDKit molecule containing a few atoms from the adjacent residues
+ - parameters such as partial charges
+
+The positions are set by the input, and the connectivity and formal charges
+are defined by the templates. Heavy atoms must match exactly. If heavy atoms
+are missing or in excess, the templates will fail to match.
+
+Missing hydrogens are added by RDKit, but are not subjected to minimization
+with a force field. Thus, their bond lengths are not super accurate.
+
+Different states of the same residue are stored as different templates,
+for example different protonation states of HIS, N-term, LYN/LYS, etc.
+Residue name is primary key unless user overrides.
+
+Currently not supported: capped residues from charmm-gui.
+
+mk_prepare_receptor
+===================
+
+Basic usage
+-----------
+
+.. code-block:: bash
+
+    mk_prepare_receptor -i examples/system.pdb --write_pdbqt prepared.pdbqt
+
+
+
+
+Protonation states
+------------------
+
+
+Adding templates
+----------------
+
+Write flags
+-----------
+
+The option flags starting with ``--write`` in  ``mk_prepare_receptor`` can
+be used both with an argument to specify the outpuf filename: 
+
+.. code-block:: bash
+
+    --write_pdbqt myenzyme.pdbqt --write_json myenzyme.json
+
+and without the filename argument as long as a default basename is provided:
+
+.. code-block:: bash
+
+    --output_basename myenzyme --write_pdbqt --write_json
+
+It is also possible to combine the two types of usage:
+
+.. code-block:: bash
+
+    --output_basename myenzyme
+    --write_pdbqt
+    --write_json
+    --write_vina_box box_for_myenzyme.txt
+
+in which case the specified filenames have priority over the default basename.
+
+.. _templates:
+
+Templates
+---------
+
+The templates contain SMILES strings that are used to create the RDKit
+molecules that constitute every residue in the processed model. In this way,
+the chemistry of the processed model is fully defined by the templates,
+and the only thing that is preserved from the input are the atom positions
+and the connectivity between residues.
+
+The SMILES strings contain all atoms that exist in the final model,
+and none that do not exist. This also applies to hydrogens,
+meaning that the SMILES are expected to have real hydrogens. Note that
+real hydrogens are different from explicit hydrogens. Real hydrogens will be
+represented as an actual atom in an RDKit molecule, while explicit hydrogens
+are a just property of heavy atoms. In the SMILES, real hydrogens are defined
+with square brackets "[H]" and explicit hydrogens without, e.g. "[nH]" to set
+the number of explicit hydrogens on an aromatic nitrogen to one.
+
+Residues that are part of a polymer, which is often all of them, will have
+bonds to adjacent residues. The heavy atoms involved in the bonds will miss
+a real hydrogen and have an implicit (or explicit) one instead. As an
+example, consider modeling an alkyl chain as a polymer, in which the monomer
+is a single carbon atom. Our template SMILES would be "[H]C[H]". The RDKit
+molecule will have three atoms and the carbon will have two implicit hydrogens.
+The implicit hydrogens correspond to bonds to adjacent residues in the
+processed polymer.