-
Notifications
You must be signed in to change notification settings - Fork 50
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #229 from forlilab/rtd-wip-lig
Update docs
- Loading branch information
Showing
25 changed files
with
917 additions
and
524 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,286 +1,67 @@ | ||
# Meeko: preparation of small molecules for AutoDock | ||
# Meeko: interface for AutoDock | ||
|
||
[![API stability](https://img.shields.io/badge/stable%20API-no-orange)](https://shields.io/) | ||
[![PyPI version fury.io](https://img.shields.io/badge/version-0.6.0-green.svg)](https://pypi.python.org/pypi/meeko/) | ||
[![Documentation Status](https://readthedocs.org/projects/meeko/badge/?version=readthedocs)](https://meeko.readthedocs.io/en/readthedocs/?badge=readthedocs) | ||
|
||
Meeko reads an RDKit molecule object and writes a PDBQT string (or file) | ||
for [AutoDock-Vina](https://github.com/ccsb-scripps/AutoDock-Vina) | ||
and [AutoDock-GPU](https://github.com/ccsb-scripps/AutoDock-GPU). | ||
It converts the docking output to RDKit molecules and | ||
SD files, without loss of bond orders. | ||
Meeko prepares the input for AutoDock and processes its output. | ||
It is developed alongside AutoDock-GPU and AutoDock-Vina. | ||
Meeko parameterizes both small organic molecules (ligands) and proteins | ||
and nucleic acids (receptors). | ||
|
||
Meeko is developed by the [Forli lab](https://forlilab.org/) at the | ||
[Center for Computational Structural Biology (CCSB)](https://ccsb.scripps.edu) | ||
at [Scripps Research](https://www.scripps.edu/). | ||
|
||
|
||
## Usage notes | ||
## Documentation | ||
|
||
Meeko does not calculate 3D coordinates or assign protonation states. | ||
Input molecules must have explicit hydrogens. | ||
The docs are hosted on [meeko.readthedocs.io](meeko.readthedocs.io) | ||
|
||
Sampling of macrocycle conformers ([paper](https://doi.org/10.1017/qrd.2022.18)) | ||
is enabled by default. | ||
|
||
SDF format strongly preferred over Mol2. | ||
See | ||
[this discussion](https://github.com/rdkit/rdkit/discussions/3647), and | ||
[this one](https://sourceforge.net/p/rdkit/mailman/message/37668451/), | ||
[also this](rdkit/rdkit#4061), | ||
[and this](https://sourceforge.net/p/rdkit/mailman/message/37374678/). | ||
and RDKit issues | ||
[1755](https://github.com/rdkit/rdkit/issues/1755) and | ||
[917](https://github.com/rdkit/rdkit/issues/917). So, what could go wrong? | ||
For example, reading Mol2 files from ZINC | ||
[led to incorrect net charge of some molecules.](https://github.com/forlilab/Meeko/issues/63) | ||
## Reporting bugs | ||
|
||
Please check if a similar bug has been reported and, if not, [open an issue](https://github.com/forlilab/Meeko/issues). | ||
|
||
## v0.6.0-alpha.3 | ||
|
||
This release aims to distribute an enhanced `mk_prepare_receptor.py`. | ||
Some features are still being developed, hence the `-alpha` suffix in the version. | ||
Reactive docking is not working in v0.6.0-alpha, but should be restored soon. | ||
Documentation is also work in progress. | ||
## Installation | ||
|
||
Visit the docs for a more complete description. One option is conda or mamba: | ||
|
||
## API changes in v0.5 | ||
|
||
Class `MoleculePreparation` no longer has method `write_pdbqt_string()`. | ||
Instead, `MoleculePreparation.prepare()` returns a list of `MoleculeSetup` objects | ||
that must be passed, individually, to `PDBQTWriterLegacy.write_string()`. | ||
```python | ||
from meeko import MoleculePreparation | ||
from meeko import PDBQTWriterLegacy | ||
|
||
preparator = MoleculePreparation() | ||
mol_setups = preparator.prepare(rdkit_molecule_3D_with_Hs) | ||
for setup in mol_setups: | ||
pdbqt_string, is_ok, error_msg = PDBQTWriterLegacy.write_string(setup) | ||
if is_ok: | ||
print(pdbqt_string, end="") | ||
``` | ||
|
||
Argument `keep_nonpolar_hydrogens` is replaced by `merge_these_atom_types`, both in the Python | ||
interface and for script `mk_prepare_ligand.py`. | ||
The default is `merge_these_atom_types=("H",)`, which | ||
merges hydrogens typed `"H"`, keeping the current default behavior. | ||
To keep all hydrogens, set `merge_these_atom_types` to an empty | ||
list when initializing `MoleculePreparation`, or pass no atom types | ||
to `--merge_these_atom_types` from the command line: | ||
```sh | ||
mk_prepare_ligand.py -i molecule.sdf --merge_these_atom_types | ||
``` | ||
|
||
## Dependencies | ||
|
||
* Python (>=3.5) | ||
* Numpy | ||
* Scipy | ||
* RDKit (>=2023.09) | ||
* ProDy (optionally, for covalent docking) | ||
|
||
Conda or Miniconda can install the dependencies: | ||
```bash | ||
conda install -c conda-forge numpy scipy rdkit | ||
pip install prody # optional. pip recommended at http://prody.csb.pitt.edu/downloads/ | ||
``` | ||
|
||
|
||
## Python tutorial | ||
|
||
|
||
#### 2. RDKit molecule from docking results | ||
|
||
```python | ||
from meeko import PDBQTMolecule | ||
from meeko import RDKitMolCreate | ||
|
||
fn = "autodock-gpu_results.dlg" | ||
pdbqt_mol = PDBQTMolecule.from_file(fn, is_dlg=True, skip_typing=True) | ||
rdkitmol_list = RDKitMolCreate.from_pdbqt_mol(pdbqt_mol) | ||
``` | ||
The length of `rdkitmol_list` is one if there are no sidechains and only one | ||
ligand was docked. | ||
If multiple ligands and/or sidechains are docked simultaneously, each will be | ||
an individual RDKit molecule in `rdkitmol_list`. | ||
Sidechains are truncated at the C-alpha. | ||
Note that docking multiple | ||
ligands simultaneously is only available in Vina, and it differs from docking | ||
multiple ligands one after the other. Each failed creation of an RDKit molecule | ||
for a ligand or sidechain results in a `None` in `rdkitmol_list`. | ||
|
||
For Vina's output PDBQT files, omit `is_dlg=True`. | ||
```python | ||
pdbqt_mol = PDBQTMolecule.from_file("vina_results.pdbqt", skip_typing=True) | ||
rdkitmol_list = RDKitMolCreate.from_pdbqt_mol(pdbqt_mol) | ||
``` | ||
|
||
When using Vina from Python, the output string can be passed directly. | ||
See [the docs](https://autodock-vina.readthedocs.io/en/latest/docking_python.html) | ||
for context on the `v` object. | ||
```python | ||
vina_output_string = v.poses() | ||
pdbqt_mol = PDBQTMolecule(vina_output_string, is_dlg=True, skip_typing=True) | ||
rdkitmol_list = RDKitMolCreate.from_pdbqt_mol(pdbqt_mol) | ||
micromamba install meeko | ||
``` | ||
|
||
#### 3. Initializing MoleculePreparation from a dictionary (or JSON) | ||
|
||
This is useful for saving and loading configuration files with json. | ||
```python | ||
import json | ||
from meeko import MoleculePreparation | ||
|
||
mk_config = {"hydrate": True} # any arguments of MoleculePreparation.__init__ | ||
print(json.dumps(mk_config), file=open('mk_config.json', 'w')) | ||
with open('mk_config.json') as f: | ||
mk_config = json.load(f) | ||
preparator = MoleculePreparation.from_config(mk_config) | ||
``` | ||
The command line tool `mk_prepare_ligand.py` can read the json files using | ||
option `-c` or `--config`. | ||
|
||
|
||
## Possibly useful configurations of MoleculePreparation | ||
|
||
Here we create an instance of MoleculePreparation that attaches pseudo | ||
waters to the ligand ([see paper on hydrated docking](https://pubs.acs.org/doi/abs/10.1021/jm2005145)), | ||
keeps macrocycles rigid (generally a bad idea), | ||
and keeps conjugated bonds and amide bonds rigid. | ||
By default, most amides are kept rigid, except for tertiary amides with | ||
different substituents on the nitrogen. | ||
or from PyPI: | ||
|
||
```python | ||
preparator = MoleculePreparation( | ||
hydrate=True, | ||
rigid_macrocycles=True, | ||
rigidify_bonds_smarts = ["C=CC=C", "[CX3](=O)[NX3]"], | ||
rigidify_bonds_indices = [(1, 2), (0, 2)], | ||
) | ||
``` | ||
|
||
The same can be done with the command line script. Note that indices of the | ||
atoms in the SMARTS are 0-based for the Python API but | ||
1-based for the command line script: | ||
```console | ||
mk_prepare_ligand.py\ | ||
--hydrate\ | ||
--rigid_macrocycles\ | ||
--rigidify_bonds_smarts "C=CC=C"\ | ||
--rigidify_bonds_indices 2 3\ | ||
--rigidify_bonds_smarts "[CX3](=O)[NX3]"\ | ||
--rigidify_bonds_indices 1 3 | ||
```bash | ||
pip install meeko | ||
``` | ||
|
||
## Docking covalent ligands as flexible sidechains | ||
## Usage | ||
|
||
The input ligand must be the product of the reaction and contain the | ||
atoms of the flexible sidechain up to (and including) the C-alpha. | ||
For example, if the ligand is an acrylamide (smiles: `C=CC(=O)N`) reacting | ||
with a cysteine (sidechain smiles: `CCS`), then the input ligand for | ||
Meeko must correspond to smiles `CCSCCC(=O)N`. | ||
Meeko exposes a Python API to enable scripting. Here we share very minimal examples | ||
using the command line scripts just to give context. | ||
Please visit the [meeko.readthedocs.io](meeko.readthedocs.io) for more information. | ||
|
||
Meeko will align the ligand atoms that match the C-alpha and C-beta of | ||
the protein sidechain. Options `--tether_smarts` and `--tether_smarts_indices` | ||
define these atoms. For a cysteine, `--tether_smarts "SCC"` and | ||
`--tether_smarts_indices 3 2` would work, although it is safer to define | ||
a more spefic SMARTS to avoid matching the ligand more than once. The first | ||
index (3 in this example) defines the C-alpha, and the second index defines | ||
the C-beta. | ||
|
||
For the example in this repository, which is based on PDB entry 7aeh, | ||
the following options prepare the covalently bound ligand for tethered docking: | ||
```console | ||
cd example/covalent_docking | ||
|
||
mk_prepare_ligand.py\ | ||
-i ligand_including_cys_sidechain.sdf\ | ||
--receptor protein.pdb\ | ||
--rec_residue ":CYS:6"\ | ||
--tether_smarts "NC(=O)C(O)(C)SCC"\ | ||
--tether_smarts_indices 9 8\ | ||
-o prepared.pdbqt | ||
Parameterizing a ligand and writing a PDBQT file: | ||
```bash | ||
mk_prepare_ligand.py -i molecule.sdf -o molecule.pdbqt | ||
``` | ||
|
||
Prody is required for preparing ligands as flexible sidechains. | ||
Often, `pip install prody` suffices to install Prody in the currently | ||
active virtual environment (e.g., conda). For more detailed installation | ||
instructions visit http://prody.csb.pitt.edu/downloads/. | ||
Parameterizing a receptor with a flexible sidechain and writing a PDBQT file | ||
as well as a JSON file that stores the entire receptor datastructure. In this | ||
example, the `-o` option sets the output base name, `-j` triggers writing the | ||
.json file, `-p` triggers writting the .pdbqt file, and `-f` makes residue | ||
42 in chain A flexible. | ||
|
||
## Reactive Docking | ||
|
||
### 1. Prepare protein with waterkit | ||
Follow `wk_prepare_receptor.py` instructions and run with `--pdb`. | ||
The goal of this step is to perform essential fixes to the protein | ||
(such as missing atoms), to add hydrogens, and to follow the Amber | ||
naming scheme for atoms and residues, e.g., `HIE` or `HID` | ||
instead of `HIS`. | ||
|
||
### 2. Prepare protein pdbqt | ||
Here, `wk.pdb` was written by waterkit. The example below will center a gridbox of specified size on the given reactive residue. | ||
|
||
```console | ||
$ mk_prepare_receptor.py\ | ||
--pdb wk.pdb\ | ||
-o receptor.pdbqt\ | ||
--flexres " :ARG:348"\ | ||
--reactive_flexres " :SER:308" | ||
--reactive_flexres " :SER:308"\ | ||
--box_center_on_reactive_res\ | ||
--box_size 40 40 40 # x y z (angstroms) | ||
``` | ||
A manual box center can be specified with `--box_center`. | ||
Reactive parameters can also be modified: | ||
```sh | ||
--r_eq_12 R_EQ_12 r_eq for reactive atoms (1-2 interaction) | ||
--eps_12 EPS_12 epsilon for reactive atoms (1-2 interaction) | ||
--r_eq_13_scaling R_EQ_13_SCALING | ||
r_eq scaling for 1-3 interaction across reactive atoms | ||
--r_eq_14_scaling R_EQ_14_SCALING | ||
r_eq scaling for 1-4 interaction across reactive atoms | ||
```bash | ||
mk_prepare_receptor.py -i nucleic_acid.cif -o my_receptor -j -p -f A:42 | ||
``` | ||
|
||
Receptor preparation can't handle heteroatoms for the moment. | ||
Also nucleic acids, ions, and post-translational modifications (e.g. | ||
phosphorilation) are not supported. Only the 20 standard amino acids | ||
can be parsed, and it is required to have Amber atom names and | ||
hydrogens. No atoms can be missing. | ||
|
||
### 3. Run autogrid | ||
Finally, converting docking results to SDF for the ligand, and PDB for the | ||
receptor with updated sidechain positions: | ||
|
||
Make affinity maps for the `_rigid.pdbqt` part of the receptor. | ||
Make affinity maps for the `_rigid.pdbqt` part of the receptor. `mk_prepare_receptor.py` will prepare the GPF for you. | ||
|
||
### 4. Write ligand PDBQT | ||
mk_prepare_ligand.py -i sufex1.sdf --reactive_smarts "S(=O)(=O)F" --reactive_smarts_idx 1 -o sufex1.pdbqt\ | ||
|
||
### 5. Configure AD-GPU for reactive docking | ||
|
||
For reactive docking there are two options that need to be passed to AutoDock-GPU: | ||
For reactive docking there are an additional option that needs to be passed to AutoDock-GPU: | ||
```console | ||
--import_dpf | ||
``` | ||
|
||
The `--derivtype` option, if needed, was written by `mk_prepare_receptor.py` to a file suffixed with `.derivtype`. | ||
|
||
The filename to be passed to `--import_dpf` was written by `mk_prepare_receptor.py` | ||
and it is suffixed with `reactive_config`. | ||
```sh | ||
ADGPU -I *.reactive_config -L sufex1.pdbqt -N sufex1_docked_ -F *_flex.pdbqt -C 1 | ||
```bash | ||
mk_export.py vina_results.pdbqt -j my_receptor.json -s lig_docked.sdf -p rec_docked.pdb | ||
``` | ||
|
||
## Polymer docs temporary placeholder | ||
|
||
Current init is a builder method designed to parse PDB files and topology | ||
objects. It is not designed to monomerize a polymer that already exists | ||
as a single RDKit molecule, or to build a single polymer molecule from the | ||
individual residues (monomers). To do that, one would have to react residues | ||
with each other. Padding could be adapted. | ||
|
||
|
||
How bonds between residues are handled and blunt residues, and deleting | ||
bonds "interactively". | ||
|
||
Document residue\_chem\_params. |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.