Assessment of the Generalization Abilities of Machine-Learning Scoring Functions for Structure-Based Virtual Screening
Author: Hui Zhu, Jincai Yang*, and Niu Huang*
split_dataset/cluster collected scripts of Pocket Pfam-based clustering (Pfam-cluster) and Protein sequence-based clustering (Seq-cluster). pdbbind_2020_cluster_result.csv contained results of two clustering approaches
split_dataset/3_fold contained the training, validation and testing dataset for generalization ability benchmark in the paper.
models/Descriptor_based_model contained source code of LR::V, LR::VR1, RF-Score, XGB::VR1 and NNScore. Other evaluated models were downloaded from individual paper.
models/shap is the Shapley Additive exPlanations (SHAP) analysis on RF-Score