Skip to content

Deep Local Analysis evaluates protein docking conformations with Locally oriented Cubes

License

Notifications You must be signed in to change notification settings

yassermb/DLA-Ranker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contents

Citation:

@article{10.1093/bioinformatics/btac551,
    author = {Mohseni Behbahani, Yasser and Crouzet, Simon and Laine, Elodie and Carbone, Alessandra},
    title = "{Deep Local Analysis evaluates protein docking conformations with locally oriented cubes}",
    journal = {Bioinformatics},
    year = {2022},
    month = {08},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btac551},
    url = {https://doi.org/10.1093/bioinformatics/btac551}
}

Overview

Deep Local Analysis (DLA)-Ranker is a deep learning framework applying 3D convolutions to a set of locally oriented cubes representing the protein interface. It explicitly considers the local geometry of the interfacial residues along with their neighboring atoms and the regions of the interface with different solvent accessibility. DLA-Ranker identifies near-native conformations and discovers alternative interfaces from ensembles generated by molecular docking.

Features:

  • Useful APIs for fast preprocessing of huge assembly of the complex conformations and classify them based on CAPRI criteria.

  • Representation of an interface as a set of locally oriented cubes.

    • Atomic density map as a 3D gird.
    • Structure class based on solvant accessibility (Support, Core, Rim).
    • Information on Receptor and Ligand.
    • Information of interfacial residues.
  • Classification of docking conformations based on CAPRI criteria (Incorrect, Acceptable, Medium, High quality)

  • Fast generation of cubes and evaluation of interface.

  • Training and testing 3D-CNN models.

  • Support various per-score aggregation schemes.

    • Considering only subset cubes for evaluation of interface.
    • Residues from Support or Core regions.
    • Residues from Core or Rim regions.
    • Selecting residues exclusively from the receptor or from the ligand.
  • Extraction of embeddings and the topology of the interface for graph representation learning.

Requirements

Packages:

DLA-Ranker can be run on Linux, MacOS, and Windows. We recommend to use DLA-Ranker on the machines with GPU. It requires following packages:

  • FreeSASA or NACCESS
  • ProDy
  • lz4 compression tool
  • Python version 3.7 or 3.8.
  • Tensorflow version 2.2 or 2.3.
  • Cuda-Toolkit
  • Scikit-Learn, numpy pandas matplotlib lz4 and tqdm (conda install -c pytorch -c pyg -c conda-forge python=3.9 numpy pandas matplotlib tqdm pytorch pyg scikit-learn cuda-toolkit lz4).

All-in-one: Run conda create --name dla-ranker --file dla-ranker.yml

Tutorial

DLA-Ranker works in two steps:

  • Generating a set of locally orient cubes representing the interface.
  • Running the deep learning framework to:
  • Train: creating a new model.
  • Test: Evaluating conformations using trained models.
  • Encode: Extracting embeddings and the topology of the interface.

Generating locally oriented cubes

Dataset of conformations:

Place the ensemble of conformations in a directory (e.g. 'Examples/conformations_directory') like below:

Example
|___conformations_list.txt
|
|___conformations_directory
    |
    |___target complex 1
    |   |   Conformation 1
    |   |   Conformation 2
    |   |   ...
    |
    |___target complex 2
    |   |   Conformation 1
    |   |   Conformation 2
    |   |   ...
    |
    ..........

'conformations_list.txt' is a csv file that contains five columns separated by ';': Name of target complex (Comp); receptor chain ID(s) (ch1), ligand chain ID(s) (ch2); Name of the conformation file (Conf); class of the conformation (Class, 0:incorrect, 1: near-native).

Processing the conformations

Specify the path to FreeSASA or NACCESS in lib/tools.py (FREESASA_PATH or NACCESS_PATH). The choice between FreeSASA or NACCESS can be specified in lib/tools.py (default is USE_FREESASA = True).

If you have 'Nvidia GPU' on your computer, or execute on 'Google COLAB', set FORCE_CPU = False in lib/tools.py. Otherwise set FORCE_CPU = True (default is FORCE_CPU=False).

From directory 'Representation' run: python generate_cubes.py

The output will be directory 'map_dir' with the following structure:

Example
|___map_dir
    |___target complex 1
    |   |___0
    |   |   |   conformation 1
    |   |   |   conformation 2
    |   |
    |   |___1
    |       |   conformation 3
    |       |   conformation 4
    |   
    |___target complex 2
    |   |___0
    |   |   |   conformation 1
    |   |   |   conformation 2
    |   |
    |   |___1
    |       |   conformation 3
    |       |   conformation 4
    ..........

Each output represents interface of a conformation and contains a set of local environments (e.g. atomic density map, structure classes (S,C,R), topology of the interface, ...)

An atomic density map is a 4 dimensional tensor: a voxelized 3D grid with a size of 24*24*24. Each voxel encodes some characteristics of the protein atoms. Namely, the first 167 dimensions correspond to the atom types that can be found in amino acids (without the hydrogen). This dimension can be reduced to 4 element symbols (C,N,O,S) by running python generate_cubes_reduce_channels_multiproc.py (ATTENTION: This code overwrites the existing files). Dimension reduction must be applied in order to use models of BM5 as well as the general model.

Deep learning framework

Following commands will use the trained models that can be found in the directory 'Models'. This directory includes 3 sets of models:

'BM5': 10 models generated following 10-fold cross validation procedure on the 142 dimers of the Docking Benchmakr version 5. The docking conformations had been generated by HADDOCK. See DeepRank.

'Dockground': 4 models generated following 4-fold cross validation procedure on the 59 target complexes of the Dockground database. The docking conformations had been generated by GRAMM-X. See [GNN-Dove] (https://www.frontiersin.org/articles/10.3389/fmolb.2021.647915/full).

'CCD4PPI': 5 models generated following 5-fold cross validation procedure on the 400 target complexes. The conformations are generated by MAXDo.

For detailed information please read the article.

Evaluation of interfaces

From directory 'Test' run python test.py It processes all the target complexes and their conformations and produces csv file 'predictions_SCR'. Each row of the output file belongs to a conformation and it has 9 columns separated by 'tab':

Name of target complex and the conformation (Conf)
Fold Id (Fold)
Score of each residue (Scores)
Region (SCR) of each residue (Regions)
Global averaged score of the interface (Score)
Processing time (Time)
Class of the conformation (Class, 0:incorrect, 1: near-native)
Partner (RecLig)
Residue number (ResNumber; according to PDB)

One can associate the Residues' numbers, regions, scores, and partner to evaluate the interface on a subset of interfacial residues.

Extraction of the embeddings

From directory 'Test' run python extract_embeddings.py It extracts embeddings and the topology for given interfaces and write them in directory 'Examples/intermediate'. For each conformation it produces an output file with the same name. Each row in a file belongs to a residue and includes the its coordinates, its region, and its embedding vector. These files can be used for aggregation of embeddings based on graph-learning.

Acknowledgement

We would like to thank Dr. Sergei Grudinin and his team for helping us with the initial source code of maps_generator and load_data.py. See Ornate.

About

Deep Local Analysis evaluates protein docking conformations with Locally oriented Cubes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages