This repository was hacked together from the work in "Learning from the ligand: using ligand-based features to improve binding affinity prediction" (https://academic.oup.com/bioinformatics/article-abstract/36/3/758/5554651). The starting point was download of the data and code found at http://opig.stats.ox.ac.uk/resources (as referenced in the manuscript)
The added scripts were used specifically to a) attempt to re-build the models and b) to use the rebuilt models to score a number of different ligand poses against various protein structures.
In the very near future I will clean this up so that the whole pipeline is runnable in an easy number of short steps
NB: the generated model files and results for our use-case are not here. They are too big.
The script in scripts/rebuild_models.py
should do this. It takes the feature sets that were distributed on the space mentioned in the paper and re-runs sci-kit learn to build the various models. These are output in the models
directory as .sav
files
For our use-case, we had many ligands to be scored against one protein, and this case many times. The script scripts/process-mpro.py
shows an example of how to do this. The script:
- Cleans up the input file (very specific to our case)
- Splits the input file by the protein target all of the ligands (again, quite specific)
- Calculates the rdkit, rfscore and binana features that are required for scoring with the models we have re-built
This is again quite sppecific for our use case, but an example is shown in scripts/run_models_mpro.py
. The script:
- loads the feature sets
- calculates the scores for each set of ligands
- writes the scores to the input sdf file