This is the GitHub repository underlying the analysis for our paper "Comparative analysis of methods for the prediction of protein-ligand binding sites", published in Journal of Cheminformatics and found here.
In this work, we gather 13 ligand binding site predictors, spanning 30 years, focusing on the latest machine learning based methods, such as VN-EGNN, IF-SitePred, GrASP, PUResNet, and DeepPocket and comparing them to established methods such as P2Rank, PRANK and fpocket and earlier methods such PocketFinder, Ligsite and Surfnet. We compare these thirteen methods thoroughly and benchmark them against our curated reference dataset, LIGYSIS, to perform an objective assessment of their prediction capabilities. An informed ranking of the methods, as well as a series of reflections and guidelines to advance this field result as conclusions of this analysis, which represents the most thorough analytical comparison of ligand binding site prediction methods to date, offering a clear framework for future developments in the field of ligand binding site prediction.
In this work, we compared the performance of eleven ligand binding site prediction methods, which are the following. Instructions to install them can be found in their respective repositories.
- VN-EGNN [1]
- IF-SitePred [2]
- GrASP [3]
- PUResNet [4]
- DeepPocketRESC [5]
- DeepPocketSEG [5]
- P2RankCONS [6]
- P2Rank [7]
- fpocketPRANK [8]
- fpocket [9]
- PocketFinder+ [10]
- Ligsite+ [11]
- Surfnet+ [12]
P2RankCONS is the same programme as P2Rank, but passing an extra argument which contains evolutionary conservation information of the target protein. This can be done through the PRANK web server or locally after deployinw P2Rank with Docker. Instructions found here.
fpocketPRANK are fpocket prdictions re-scored with PRANK.
DeepPocketRESC corresponds to the default DeepPocket implementation, i.e., re-scored and re-ranked fpocket predictions. DeepPocketSEG represents the pocket shapes extracted from the DeepPocket CNN segmentation module. These predictions were obtained by removing the -r 3
flag, and so pocket shapes were extracted for all pockets, and not just the top-3.
Methods 11-13 represent the implementations by Capra, et al., 2009, [13] and are therefore indicated by the + superindex.
To run the analysis code, install the conda environment with the following dependencies:
conda env create -f environment.yml
or
conda create -n lbs_comp_env python=3.10 scipy scikit-learn biopython pandas seaborn matplotlib numpy -c conda-forge
ProIntVar is used for some of the data processing. For installation instructions, refer here: ProIntVar repository.
The following are dependencies that were used for preotein and ligand site characterisation and visualisation:
- Sestak, F., et al., VN-EGNN: E(3)-Equivariant Graph Neural Networks with Virtual Nodes Enhance Protein Binding Site Identification. arXiv [cs.LG], 2024.
- Carbery A, Buttenschoen M, Skyner R, von Delft F, Deane CM. Learnt representations of proteins can be used for accurate prediction of small molecule binding sites on experimentally determined and predicted protein structures. J Cheminform. 2024 Mar 14;16(1):32. doi: 10.1186/s13321-024-00821-4. PMID: 38486231; PMCID: PMC10941399.
- Smith Z, Strobel M, Vani BP, Tiwary P. Graph Attention Site Prediction (GrASP): Identifying Druggable Binding Sites Using Graph Neural Networks with Attention. J Chem Inf Model. 2024 Apr 8;64(7):2637-2644. doi: 10.1021/acs.jcim.3c01698. Epub 2024 Mar 7. PMID: 38453912; PMCID: PMC11182664.
- Jeevan K, Palistha S, Tayara H, Chong KT. PUResNetV2.0: a deep learning model leveraging sparse representation for improved ligand binding site prediction. J Cheminform. 2024 Jun 7;16(1):66. doi: 10.1186/s13321-024-00865-6. PMID: 38849917; PMCID: PMC11157904.
- Aggarwal R, Gupta A, Chelur V, Jawahar CV, Priyakumar UD. DeepPocket: Ligand Binding Site Detection and Segmentation using 3D Convolutional Neural Networks. J Chem Inf Model. 2022 Nov 14;62(21):5069-5079. doi: 10.1021/acs.jcim.1c00799. Epub 2021 Aug 10. PMID: 34374539.
- Jendele L, Krivak R, Skoda P, Novotny M, Hoksza D. PrankWeb: a web server for ligand binding site prediction and visualization. Nucleic Acids Res. 2019 Jul 2;47(W1):W345-W349. doi: 10.1093/nar/gkz424. PMID: 31114880; PMCID: PMC6602436.
- Krivák R, Hoksza D. P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure. J Cheminform. 2018 Aug 14;10(1):39. doi: 10.1186/s13321-018-0285-8. PMID: 30109435; PMCID: PMC6091426.
- Krivák R, Hoksza D. Improving protein-ligand binding site prediction accuracy by classification of inner pocket points using local features. J Cheminform. 2015 Apr 1;7:12. doi: 10.1186/s13321-015-0059-5. PMID: 25932051; PMCID: PMC4414931.
- Le Guilloux V, Schmidtke P, Tuffery P. Fpocket: an open source platform for ligand pocket detection. BMC Bioinformatics. 2009 Jun 2;10:168. doi: 10.1186/1471-2105-10-168. PMID: 19486540; PMCID: PMC2700099.
- An J, Totrov M, Abagyan R. Pocketome via comprehensive identification and classification of ligand binding envelopes. Mol Cell Proteomics. 2005 Jun;4(6):752-61. doi: 10.1074/mcp.M400159-MCP200. Epub 2005 Mar 9. PMID: 15757999.
- Hendlich M, Rippmann F, Barnickel G. LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J Mol Graph Model. 1997 Dec;15(6):359-63, 389. doi: 10.1016/s1093-3263(98)00002-3. PMID: 9704298.
- Laskowski RA. SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J Mol Graph. 1995 Oct;13(5):323-30, 307-8. doi: 10.1016/0263-7855(95)00073-9. PMID: 8603061.
- Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput Biol. 2009 Dec;5(12):e1000585. doi: 10.1371/journal.pcbi.1000585. Epub 2009 Dec 4. PMID: 19997483; PMCID: PMC2777313.
- Durrant JD, Votapka L, Sørensen J, Amaro RE. POVME 2.0: An Enhanced Tool for Determining Pocket Shape and Volume Characteristics. J Chem Theory Comput. 2014 Nov 11;10(11):5047-5056. doi: 10.1021/ct500381c. Epub 2014 Sep 29. PMID: 25400521; PMCID: PMC4230373.
- Chen CR, Makhatadze GI. ProteinVolume: calculating molecular van der Waals and void volumes in proteins. BMC Bioinformatics. 2015 Mar 26;16(1):101. doi: 10.1186/s12859-015-0531-2. PMID: 25885484; PMCID: PMC4379742.
- Mitternacht S. FreeSASA: An open source C library for solvent accessible surface area calculations. F1000Res. 2016 Feb 18;5:189. doi: 10.12688/f1000research.7931.1. PMID: 26973785; PMCID: PMC4776673.
- Pettersen EF, Goddard TD, Huang CC, Meng EC, Couch GS, Croll TI, Morris JH, Ferrin TE. UCSF ChimeraX: Structure visualization for researchers, educators, and developers. Protein Sci. 2021 Jan;30(1):70-82. doi: 10.1002/pro.3943. Epub 2020 Oct 22. PMID: 32881101; PMCID: PMC7737788.
- Schrödinger, L.L.C., The PyMOL Molecular Graphics System, Version 1.8. 2015. Available at https://pymol.org/