"Exploring the Potential of Adaptive, Local Machine Learning (ML) in Comparison to the Prediction Performance of Global Models: A Case Study from Bayer's Caco-2 Permeability Database"
Repository for archiving the main code chunks used for the local and global machine learning models in the publication "Exploring the Potential of Adaptive, Local Machine Learning (ML) in Comparison ton the Prediction Performance of Global Models: A Case Study from Bayer's Caco-2 Permeability Database" published in 2024 in ACS Journal of Chemical Information and Modeling (JCIM) as 1st publication of my doctoral studies at Bayer.
The five different included files contain the main code chunks for:
- Data preparation (SMILES/molecule object standardization; PaDEL descriptor calculation)
- Global models (including other descriptor calculations and recursive feature elimination with cross-validation as well as external TDC benchmarking1)
- Local model (training data selection via fixed tanimoto similarity criteria)
- Local model (training data selection via fixed amounts of most similar structuress)
- Local model (training data selection via kNN2 as control/proof of superiority of the chosen tanimoto similarity approach)
If you have further questions or need additional parts of the utilized code for your own studies, feel free to contact [email protected].