Skip to content

molinfo-vienna/FAME.AL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Site-of-metabolism prediction with active learning

DOI

Active learning for sites of metabolism prediction demonstrated with Zaretzki dataset. This data set was motified from the original data set to get the format of molecular structure (preprocessed using RDKit and ChEMBL Structue Pipeline) with all annotated sites of metabolism in one .sdf file (data/zaretzki_preprocessed.sdf).

Dependencies

Files

Calculating CDPKit FAME descriptor

  • Code to calculate CDPKit FAME descriptor.
    • src/features/cdpkit_calculate_fame_descriptors.py
    • Example:
python3 src/features/cdpkit_calculate_fame_descriptors.py -i data/zaretzki_preprocessed.sdf -o output/ -r 5 -m

Active learning for site of metabolism prediction

  • Code to run active learning with random forest classifier.
    • src/models/AL_for_SoM_pred.py
    • Examples:
# split data to get 5-fold
python3 src/models/splits_for_AL.py
# active learning in 5-fold cross validation
python3 src/models/AL_for_SoM_pred.py -i output/zaretzki_r5_5folds_random_split.csv -o output/active_learning/01.random_sampling_vs_AL/ -ct 0.3 -af -n 5 
# random selection in 5-fold cross validation
python3 src/models/AL_for_SoM_pred.py -i output/zaretzki_r5_5folds_random_split.csv -o output/active_learning/01.random_sampling_vs_AL/ -ct 0.3 -af -n 5 -rs
# repeat on one validation set for 5 times
python3 src/models/AL_for_SoM_pred.py -i output/zaretzki_r5_5folds_random_split.csv -o output/active_learning/01.random_sampling_vs_AL/ -ct 0.3 -tf 1 -n 5 -rfs
  • Visualize results
    • notebooks/zaretzki_active_learning_result.ipynb

About

Active learning for sites of metabolism prediction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published