Skip to content

Commit

Permalink
Merge pull request #227 from forlilab/readthedocs
Browse files Browse the repository at this point in the history
Readthedocs
  • Loading branch information
rwxayheee authored Nov 7, 2024
2 parents 1a2de91 + 1b4f6c0 commit ba31e10
Show file tree
Hide file tree
Showing 27 changed files with 1,753 additions and 76 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# MacOS system files
*.DS_Store

# Byte-compiled / optimized / DLL files
Expand Down
35 changes: 35 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Read the Docs configuration file for Sphinx projects
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the OS, Python version and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.12"
# You can also specify other tool versions:
# nodejs: "20"
# rust: "1.70"
# golang: "1.20"

# Build documentation in the "docs/" directory with Sphinx
sphinx:
configuration: docs/source/conf.py
# You can configure Sphinx to use a different builder, for instance use the dirhtml builder for simpler URLs
# builder: "dirhtml"
# Fail on all warnings to avoid broken references
# fail_on_warning: true

# Optionally build your docs in additional formats such as PDF and ePub
# formats:
# - pdf
# - epub

# Optional but recommended, declare the Python requirements required
# to build your documentation
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
python:
install:
- requirements: docs/requirements.txt
77 changes: 1 addition & 76 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

[![API stability](https://img.shields.io/badge/stable%20API-no-orange)](https://shields.io/)
[![PyPI version fury.io](https://img.shields.io/badge/version-0.6.0-green.svg)](https://pypi.python.org/pypi/meeko/)
[![Documentation Status](https://readthedocs.org/projects/meeko/badge/?version=readthedocs)](https://meeko.readthedocs.io/en/readthedocs/?badge=readthedocs)

Meeko reads an RDKit molecule object and writes a PDBQT string (or file)
for [AutoDock-Vina](https://github.com/ccsb-scripps/AutoDock-Vina)
Expand Down Expand Up @@ -85,85 +86,9 @@ conda install -c conda-forge numpy scipy rdkit
pip install prody # optional. pip recommended at http://prody.csb.pitt.edu/downloads/
```

## Installation (from PyPI)
```bash
$ pip install meeko
```
If using conda, `pip` installs the package in the active environment.

## Installation (from source)
You'll get the develop branch, which may be ahead of the latest release.
```bash
$ git clone https://github.com/forlilab/Meeko
$ cd Meeko
$ pip install .
```

Optionally include `--editable`. Changes in the original package location
take effect immediately without the need to run `pip install .` again.
```bash
$ pip install --editable .
```


## Examples using the command line scripts

#### 1. make PDBQT files
AutoDock-GPU and Vina read molecules in the PDBQT format. These can be prepared
by Meeko from SD files, or from Mol2 files, but SDF is strongly preferred.
```console
mk_prepare_ligand.py -i molecule.sdf -o molecule.pdbqt
mk_prepare_ligand.py -i multi_mol.sdf --multimol_outdir folder_for_pdbqt_files
```

#### 2. convert docking results to SDF
AutoDock-GPU and Vina write docking results in the PDBQT format. The DLG output
from AutoDock-GPU contains docked poses in PDBQT blocks.
Meeko generates RDKit molecules from PDBQT files (or strings) using the SMILES
string in the REMARK lines. The REMARK lines also have the mapping of atom indices
between SMILES and PDBQT. SD files with docked coordinates are written
from RDKit molecules.

```console
mk_export.py molecule.pdbqt -o molecule.sdf
mk_export.py vina_results.pdbqt -o vina_results.sdf
mk_export.py autodock-gpu_results.dlg -o autodock-gpu_results.sdf
```

Making RDKit molecules from SMILES is safer than guessing bond orders
from the coordinates, specially because the PDBQT lacks hydrogens bonded
to carbon. As an example, consider the following conversion, in which
OpenBabel adds an extra double bond, not because it has a bad algorithm,
but because this is a nearly impossible task.
```console
$ obabel -:"C1C=CCO1" -o pdbqt --gen3d | obabel -i pdbqt -o smi
[C]1=[C][C]=[C]O1
```

## Python tutorial

#### 1. making PDBQT strings for Vina or for AutoDock-GPU

```python
from meeko import MoleculePreparation
from meeko import PDBQTWriterLegacy
from rdkit import Chem

input_molecule_file = "example/BACE_macrocycle/BACE_4.sdf"

# there is one molecule in this SD file, this loop iterates just once
for mol in Chem.SDMolSupplier(input_molecule_file, removeHs=False):
preparator = MoleculePreparation()
mol_setups = preparator.prepare(mol)
for setup in mol_setups:
setup.show() # optional
pdbqt_string = PDBQTWriterLegacy.write_string(setup)
```
At this point, `pdbqt_string` can be written to a file for
docking with AutoDock-GPU or Vina, or passed directly to Vina within Python
using `set_ligand_from_string(pdbqt_string)`. For context, see
[the docs on using Vina from Python](https://autodock-vina.readthedocs.io/en/latest/docking_python.html).


#### 2. RDKit molecule from docking results

Expand Down
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
35 changes: 35 additions & 0 deletions docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)

if "%1" == "" goto help

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
1 change: 1 addition & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
sphinx-book-theme
43 changes: 43 additions & 0 deletions docs/source/cli_export_result.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
mk_export.py
============

Convert docking results to SDF
------------------------------

AutoDock-GPU and Vina write docking results in the PDBQT format. The DLG output
from AutoDock-GPU contains docked poses in PDBQT blocks, plus additional information.
Meeko generates RDKit molecules from PDBQT using the SMILES
string in the REMARK lines. The REMARK lines also have the mapping of atom indices
between SMILES and PDBQT. SD files with docked coordinates are written
from RDKit molecules.

.. code-block:: bash
mk_export.py molecule.pdbqt -o molecule.sdf
mk_export.py vina_results.pdbqt -o vina_results.sdf
mk_export.py autodock-gpu_results.dlg -o autodock-gpu_results.sdf
Why this matters
----------------

Making RDKit molecules from SMILES is safer than guessing bond orders
from the coordinates, specially because the PDBQT lacks hydrogens bonded
to carbon. As an example, consider the following conversion, in which
OpenBabel adds an extra double bond, not because it has a bad algorithm,
but because this is a nearly impossible task.

.. code-block:: bash
obabel -:"C1C=CCO1" -o pdbqt --gen3d | obabel -i pdbqt -o smi
[C]1=[C][C]=[C]O1
Caveats
-------

If docking does not use explicit Hs, which it often does not, the
exported positions of hydrogens are calculated from RDKit. This can
be annoying if a careful forcefield minimization is employed before
docking, as probably rigorous Hs positions will be replaced by the
RDKit geometry rules, which are empirical and much simpler than most
force fields.
16 changes: 16 additions & 0 deletions docs/source/cli_lig_prep.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
mk_prepare_ligand.py
====================

Command line tool to prepare small organic molecules.

Write PDBQT files
-----------------

AutoDock-GPU and Vina read molecules in the PDBQT format. These can be prepared
by Meeko from SD files, or from Mol2 files, but SDF is strongly preferred.

.. code-block:: bash
mk_prepare_ligand.py -i molecule.sdf -o molecule.pdbqt
mk_prepare_ligand.py -i multi_mol.sdf --multimol_outdir folder_for_pdbqt_files
104 changes: 104 additions & 0 deletions docs/source/cli_rec_prep.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
The input structure is matched against templates to
guarantee chemical correctness and identify problems with the input structures.
This allows the user to identify and fix problems, resulting in a molecular
model that is correct with respect to heavy atoms, protonation state,
connectivity, bond orders, and formal charges.

The matching algorithm uses the connectivity and elements, but not bond orders
or atom names. Hydrogens are optional. This makes it compatible with input
files from various sources.

Templates are matched on a per residue basis. Each residue is represented
as an instance of a PolymerResidue object, which contains:
- an RDKit molecule that represents the actual state
- a padded RDKit molecule containing a few atoms from the adjacent residues
- parameters such as partial charges

The positions are set by the input, and the connectivity and formal charges
are defined by the templates. Heavy atoms must match exactly. If heavy atoms
are missing or in excess, the templates will fail to match.

Missing hydrogens are added by RDKit, but are not subjected to minimization
with a force field. Thus, their bond lengths are not super accurate.

Different states of the same residue are stored as different templates,
for example different protonation states of HIS, N-term, LYN/LYS, etc.
Residue name is primary key unless user overrides.

Currently not supported: capped residues from charmm-gui.

mk_prepare_receptor
===================

Basic usage
-----------

.. code-block:: bash
mk_prepare_receptor -i examples/system.pdb --write_pdbqt prepared.pdbqt
Protonation states
------------------


Adding templates
----------------

Write flags
-----------

The option flags starting with ``--write`` in ``mk_prepare_receptor`` can
be used both with an argument to specify the outpuf filename:

.. code-block:: bash
--write_pdbqt myenzyme.pdbqt --write_json myenzyme.json
and without the filename argument as long as a default basename is provided:

.. code-block:: bash
--output_basename myenzyme --write_pdbqt --write_json
It is also possible to combine the two types of usage:

.. code-block:: bash
--output_basename myenzyme
--write_pdbqt
--write_json
--write_vina_box box_for_myenzyme.txt
in which case the specified filenames have priority over the default basename.

.. _templates:

Templates
---------

The templates contain SMILES strings that are used to create the RDKit
molecules that constitute every residue in the processed model. In this way,
the chemistry of the processed model is fully defined by the templates,
and the only thing that is preserved from the input are the atom positions
and the connectivity between residues.

The SMILES strings contain all atoms that exist in the final model,
and none that do not exist. This also applies to hydrogens,
meaning that the SMILES are expected to have real hydrogens. Note that
real hydrogens are different from explicit hydrogens. Real hydrogens will be
represented as an actual atom in an RDKit molecule, while explicit hydrogens
are a just property of heavy atoms. In the SMILES, real hydrogens are defined
with square brackets "[H]" and explicit hydrogens without, e.g. "[nH]" to set
the number of explicit hydrogens on an aromatic nitrogen to one.

Residues that are part of a polymer, which is often all of them, will have
bonds to adjacent residues. The heavy atoms involved in the bonds will miss
a real hydrogen and have an implicit (or explicit) one instead. As an
example, consider modeling an alkyl chain as a polymer, in which the monomer
is a single carbon atom. Our template SMILES would be "[H]C[H]". The RDKit
molecule will have three atoms and the carbon will have two implicit hydrogens.
The implicit hydrogens correspond to bonds to adjacent residues in the
processed polymer.
Loading

0 comments on commit ba31e10

Please sign in to comment.