-
Notifications
You must be signed in to change notification settings - Fork 21
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Esben Jannik Bjerrum
committed
Apr 12, 2024
1 parent
9cd2d33
commit a10998f
Showing
2 changed files
with
56 additions
and
50 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,13 @@ | ||
# scikit-mol | ||
|
||
![Fancy logo](./ressources/logo/ScikitMol_Logo_DarkBG_300px.png#gh-dark-mode-only) | ||
![Fancy logo](./ressources/logo/ScikitMol_Logo_LightBG_300px.png#gh-light-mode-only) | ||
|
||
## Scikit-Learn classes for molecular vectorization using RDKit | ||
|
||
The intended usage is to be able to add molecular vectorization directly into scikit-learn pipelines, so that the final model directly predict on RDKit molecules or SMILES strings | ||
|
||
As example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and _test lists: | ||
As example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and \_test lists: | ||
|
||
pipe = Pipeline([('mol_transformer', MorganFingerprintTransformer()), ('Regressor', Ridge())]) | ||
pipe.fit(mol_list_train, y_train) | ||
|
@@ -18,39 +20,40 @@ The scikit-learn compatibility should also make it easier to include the fingerp | |
|
||
The first draft for the project was created at the [RDKIT UGM 2022 hackathon](https://github.com/rdkit/UGM_2022) 2022-October-14 | ||
|
||
|
||
## Implemented | ||
* Descriptors | ||
* MolecularDescriptorTransformer | ||
|
||
- Descriptors | ||
- MolecularDescriptorTransformer | ||
|
||
<br> | ||
|
||
* Fingerprints | ||
* MorganFingerprintTransformer | ||
* MACCSKeysFingerprintTransformer | ||
* RDKitFingerprintTransformer | ||
* AtomPairFingerprintTransformer | ||
* TopologicalTorsionFingerprintTransformer | ||
* MHFingerprintTransformer | ||
* SECFingerprintTransformer | ||
* AvalonFingerprintTransformer | ||
- Fingerprints | ||
- MorganFingerprintTransformer | ||
- MACCSKeysFingerprintTransformer | ||
- RDKitFingerprintTransformer | ||
- AtomPairFingerprintTransformer | ||
- TopologicalTorsionFingerprintTransformer | ||
- MHFingerprintTransformer | ||
- SECFingerprintTransformer | ||
- AvalonFingerprintTransformer | ||
|
||
<br> | ||
|
||
* Conversions | ||
* SmilesToMol | ||
- Conversions | ||
- SmilesToMol | ||
|
||
<br> | ||
|
||
* Standardizer | ||
* Standardizer | ||
- Standardizer | ||
- Standardizer | ||
|
||
<br> | ||
|
||
* Utilities | ||
* CheckSmilesSanitazion | ||
- Utilities | ||
- CheckSmilesSanitazion | ||
|
||
## Installation | ||
|
||
Users can install latest tagged release from pip | ||
|
||
pip install scikit-mol | ||
|
@@ -63,35 +66,37 @@ Bleeding edge | |
|
||
There are a collection of notebooks in the notebooks directory which demonstrates some different aspects and use cases | ||
|
||
* [Basic Usage and fingerprint transformers](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/01_basic_usage.ipynb) | ||
* [Descriptor transformer](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/02_descriptor_transformer.ipynb) | ||
* [Pipelining with Scikit-Learn classes](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/03_example_pipeline.ipynb) | ||
* [Molecular standardization](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/04_standardizer.ipynb) | ||
* [Sanitizing SMILES input](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/05_smiles_sanitaztion.ipynb) | ||
* [Integrated hyperparameter tuning of Scikit-Learn estimator and Scikit-Mol transformer](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/06_hyperparameter_tuning.ipynb) | ||
* [Using parallel execution to speed up descriptor and fingerprint calculations](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/07_parallel_transforms.ipynb) | ||
* [Testing different fingerprints as part of the hyperparameter optimization](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/09_Combinatorial_Method_Usage_with_FingerPrint_Transformers.ipynb) | ||
* [Using pandas output for easy feature importance analysis and combine pre-exisitng values with new computations](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/10_pipeline_pandas_output.ipynb) | ||
|
||
- [Basic Usage and fingerprint transformers](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/01_basic_usage.ipynb) | ||
- [Descriptor transformer](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/02_descriptor_transformer.ipynb) | ||
- [Pipelining with Scikit-Learn classes](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/03_example_pipeline.ipynb) | ||
- [Molecular standardization](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/04_standardizer.ipynb) | ||
- [Sanitizing SMILES input](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/05_smiles_sanitaztion.ipynb) | ||
- [Integrated hyperparameter tuning of Scikit-Learn estimator and Scikit-Mol transformer](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/06_hyperparameter_tuning.ipynb) | ||
- [Using parallel execution to speed up descriptor and fingerprint calculations](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/07_parallel_transforms.ipynb) | ||
- [Using skopt for hyperparameter tuning](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/08_external_library_skopt.ipynb) | ||
- [Testing different fingerprints as part of the hyperparameter optimization](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/09_Combinatorial_Method_Usage_with_FingerPrint_Transformers.ipynb) | ||
- [Using pandas output for easy feature importance analysis and combine pre-exisitng values with new computations](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/10_pipeline_pandas_output.ipynb) | ||
|
||
We also put a software note on ChemRxiv. [https://doi.org/10.26434/chemrxiv-2023-fzqwd](https://doi.org/10.26434/chemrxiv-2023-fzqwd) | ||
We also put a software note on ChemRxiv. [https://doi.org/10.26434/chemrxiv-2023-fzqwd](https://doi.org/10.26434/chemrxiv-2023-fzqwd) | ||
|
||
## Contributing | ||
|
||
There are more information about how to contribute to the project in [CONTRIBUTION.md](https://github.com/EBjerrum/scikit-mol/CONTRIBUTION.md) | ||
|
||
## BUGS | ||
|
||
Probably still, please check issues at GitHub and report there | ||
|
||
## Contributers: | ||
* Esben Jannik Bjerrum [@ebjerrum](https://github.com/ebjerrum), [email protected] | ||
* Carmen Esposito [@cespos](https://github.com/cespos) | ||
* Son Ha, [email protected] | ||
* Oh-hyeon Choung, [email protected] | ||
* Andreas Poehlmann, [@ap--](https://github.com/ap--) | ||
* Ya Chen, [@anya-chen](https://github.com/anya-chen) | ||
* Rafał Bachorz [@rafalbachorz](https://github.com/rafalbachorz) | ||
* Adrien Chaton [@adrienchaton](https://github.com/adrienchaton) | ||
* [@VincentAlexanderScholz](https://github.com/VincentAlexanderScholz) | ||
* [@RiesBen](https://github.com/RiesBen) | ||
* [@enricogandini](https://github.com/enricogandini) | ||
|
||
- Esben Jannik Bjerrum [@ebjerrum](https://github.com/ebjerrum), [email protected] | ||
- Carmen Esposito [@cespos](https://github.com/cespos) | ||
- Son Ha, [email protected] | ||
- Oh-hyeon Choung, [email protected] | ||
- Andreas Poehlmann, [@ap--](https://github.com/ap--) | ||
- Ya Chen, [@anya-chen](https://github.com/anya-chen) | ||
- Rafał Bachorz [@rafalbachorz](https://github.com/rafalbachorz) | ||
- Adrien Chaton [@adrienchaton](https://github.com/adrienchaton) | ||
- [@VincentAlexanderScholz](https://github.com/VincentAlexanderScholz) | ||
- [@RiesBen](https://github.com/RiesBen) | ||
- [@enricogandini](https://github.com/enricogandini) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters