Site-of-metabolism prediction with active learning

Active learning for sites of metabolism prediction demonstrated with Zaretzki dataset. This data set was motified from the original data set to get the format of molecular structure (preprocessed using RDKit and ChEMBL Structue Pipeline) with all annotated sites of metabolism in one .sdf file (data/zaretzki_preprocessed.sdf).

Dependencies

Files

Calculating CDPKit FAME descriptor

Code to calculate CDPKit FAME descriptor.
- src/features/cdpkit_calculate_fame_descriptors.py
- Example:

python3 src/features/cdpkit_calculate_fame_descriptors.py -i data/zaretzki_preprocessed.sdf -o output/ -r 5 -m

Active learning for site of metabolism prediction

Code to run active learning with random forest classifier.
- src/models/AL_for_SoM_pred.py
- Examples:

# split data to get 5-fold
python3 src/models/splits_for_AL.py

# active learning in 5-fold cross validation
python3 src/models/AL_for_SoM_pred.py -i output/zaretzki_r5_5folds_random_split.csv -o output/active_learning/01.random_sampling_vs_AL/ -ct 0.3 -af -n 5

# random selection in 5-fold cross validation
python3 src/models/AL_for_SoM_pred.py -i output/zaretzki_r5_5folds_random_split.csv -o output/active_learning/01.random_sampling_vs_AL/ -ct 0.3 -af -n 5 -rs

# repeat on one validation set for 5 times
python3 src/models/AL_for_SoM_pred.py -i output/zaretzki_r5_5folds_random_split.csv -o output/active_learning/01.random_sampling_vs_AL/ -ct 0.3 -tf 1 -n 5 -rfs

Visualize results
- notebooks/zaretzki_active_learning_result.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
notebooks		notebooks
output/active_learning		output/active_learning
src		src
LICENSE		LICENSE
README.md		README.md
citation.bib		citation.bib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Site-of-metabolism prediction with active learning

Dependencies

Files

Calculating CDPKit FAME descriptor

Active learning for site of metabolism prediction

About

Releases

Packages

Languages

License

molinfo-vienna/FAME.AL

Folders and files

Latest commit

History

Repository files navigation

Site-of-metabolism prediction with active learning

Dependencies

Files

Calculating CDPKit FAME descriptor

Active learning for site of metabolism prediction

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages