Skip to content

Commit

Permalink
Updated readmes with notebooks
Browse files Browse the repository at this point in the history
  • Loading branch information
Esben Jannik Bjerrum committed Apr 12, 2024
1 parent 9cd2d33 commit a10998f
Show file tree
Hide file tree
Showing 2 changed files with 56 additions and 50 deletions.
87 changes: 46 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
# scikit-mol

![Fancy logo](./ressources/logo/ScikitMol_Logo_DarkBG_300px.png#gh-dark-mode-only)
![Fancy logo](./ressources/logo/ScikitMol_Logo_LightBG_300px.png#gh-light-mode-only)

## Scikit-Learn classes for molecular vectorization using RDKit

The intended usage is to be able to add molecular vectorization directly into scikit-learn pipelines, so that the final model directly predict on RDKit molecules or SMILES strings

As example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and _test lists:
As example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and \_test lists:

pipe = Pipeline([('mol_transformer', MorganFingerprintTransformer()), ('Regressor', Ridge())])
pipe.fit(mol_list_train, y_train)
Expand All @@ -18,39 +20,40 @@ The scikit-learn compatibility should also make it easier to include the fingerp

The first draft for the project was created at the [RDKIT UGM 2022 hackathon](https://github.com/rdkit/UGM_2022) 2022-October-14


## Implemented
* Descriptors
* MolecularDescriptorTransformer

- Descriptors
- MolecularDescriptorTransformer

<br>

* Fingerprints
* MorganFingerprintTransformer
* MACCSKeysFingerprintTransformer
* RDKitFingerprintTransformer
* AtomPairFingerprintTransformer
* TopologicalTorsionFingerprintTransformer
* MHFingerprintTransformer
* SECFingerprintTransformer
* AvalonFingerprintTransformer
- Fingerprints
- MorganFingerprintTransformer
- MACCSKeysFingerprintTransformer
- RDKitFingerprintTransformer
- AtomPairFingerprintTransformer
- TopologicalTorsionFingerprintTransformer
- MHFingerprintTransformer
- SECFingerprintTransformer
- AvalonFingerprintTransformer

<br>

* Conversions
* SmilesToMol
- Conversions
- SmilesToMol

<br>

* Standardizer
* Standardizer
- Standardizer
- Standardizer

<br>

* Utilities
* CheckSmilesSanitazion
- Utilities
- CheckSmilesSanitazion

## Installation

Users can install latest tagged release from pip

pip install scikit-mol
Expand All @@ -63,35 +66,37 @@ Bleeding edge

There are a collection of notebooks in the notebooks directory which demonstrates some different aspects and use cases

* [Basic Usage and fingerprint transformers](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/01_basic_usage.ipynb)
* [Descriptor transformer](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/02_descriptor_transformer.ipynb)
* [Pipelining with Scikit-Learn classes](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/03_example_pipeline.ipynb)
* [Molecular standardization](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/04_standardizer.ipynb)
* [Sanitizing SMILES input](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/05_smiles_sanitaztion.ipynb)
* [Integrated hyperparameter tuning of Scikit-Learn estimator and Scikit-Mol transformer](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/06_hyperparameter_tuning.ipynb)
* [Using parallel execution to speed up descriptor and fingerprint calculations](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/07_parallel_transforms.ipynb)
* [Testing different fingerprints as part of the hyperparameter optimization](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/09_Combinatorial_Method_Usage_with_FingerPrint_Transformers.ipynb)
* [Using pandas output for easy feature importance analysis and combine pre-exisitng values with new computations](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/10_pipeline_pandas_output.ipynb)

- [Basic Usage and fingerprint transformers](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/01_basic_usage.ipynb)
- [Descriptor transformer](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/02_descriptor_transformer.ipynb)
- [Pipelining with Scikit-Learn classes](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/03_example_pipeline.ipynb)
- [Molecular standardization](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/04_standardizer.ipynb)
- [Sanitizing SMILES input](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/05_smiles_sanitaztion.ipynb)
- [Integrated hyperparameter tuning of Scikit-Learn estimator and Scikit-Mol transformer](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/06_hyperparameter_tuning.ipynb)
- [Using parallel execution to speed up descriptor and fingerprint calculations](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/07_parallel_transforms.ipynb)
- [Using skopt for hyperparameter tuning](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/08_external_library_skopt.ipynb)
- [Testing different fingerprints as part of the hyperparameter optimization](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/09_Combinatorial_Method_Usage_with_FingerPrint_Transformers.ipynb)
- [Using pandas output for easy feature importance analysis and combine pre-exisitng values with new computations](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/10_pipeline_pandas_output.ipynb)

We also put a software note on ChemRxiv. [https://doi.org/10.26434/chemrxiv-2023-fzqwd](https://doi.org/10.26434/chemrxiv-2023-fzqwd)
We also put a software note on ChemRxiv. [https://doi.org/10.26434/chemrxiv-2023-fzqwd](https://doi.org/10.26434/chemrxiv-2023-fzqwd)

## Contributing

There are more information about how to contribute to the project in [CONTRIBUTION.md](https://github.com/EBjerrum/scikit-mol/CONTRIBUTION.md)

## BUGS

Probably still, please check issues at GitHub and report there

## Contributers:
* Esben Jannik Bjerrum [@ebjerrum](https://github.com/ebjerrum), [email protected]
* Carmen Esposito [@cespos](https://github.com/cespos)
* Son Ha, [email protected]
* Oh-hyeon Choung, [email protected]
* Andreas Poehlmann, [@ap--](https://github.com/ap--)
* Ya Chen, [@anya-chen](https://github.com/anya-chen)
* Rafał Bachorz [@rafalbachorz](https://github.com/rafalbachorz)
* Adrien Chaton [@adrienchaton](https://github.com/adrienchaton)
* [@VincentAlexanderScholz](https://github.com/VincentAlexanderScholz)
* [@RiesBen](https://github.com/RiesBen)
* [@enricogandini](https://github.com/enricogandini)

- Esben Jannik Bjerrum [@ebjerrum](https://github.com/ebjerrum), [email protected]
- Carmen Esposito [@cespos](https://github.com/cespos)
- Son Ha, [email protected]
- Oh-hyeon Choung, [email protected]
- Andreas Poehlmann, [@ap--](https://github.com/ap--)
- Ya Chen, [@anya-chen](https://github.com/anya-chen)
- Rafał Bachorz [@rafalbachorz](https://github.com/rafalbachorz)
- Adrien Chaton [@adrienchaton](https://github.com/adrienchaton)
- [@VincentAlexanderScholz](https://github.com/VincentAlexanderScholz)
- [@RiesBen](https://github.com/RiesBen)
- [@enricogandini](https://github.com/enricogandini)
19 changes: 10 additions & 9 deletions notebooks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,13 @@

This is a collection of notebooks in the notebooks directory which demonstrates some different aspects and use cases

* [Basic Usage and fingerprint transformers](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/01_basic_usage.ipynb)
* [Descriptor transformer](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/02_descriptor_transformer.ipynb)
* [Pipelining with Scikit-Learn classes](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/03_example_pipeline.ipynb)
* [Molecular standardization](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/04_standardizer.ipynb)
* [Sanitizing SMILES input](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/05_smiles_sanitaztion.ipynb)
* [Integrated hyperparameter tuning of Scikit-Learn estimator and Scikit-Mol transformer](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/06_hyperparameter_tuning.ipynb)
* [Using parallel execution to speed up descriptor and fingerprint calculations](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/07_parallel_transforms.ipynb)
* [Testing different fingerprints as part of the hyperparameter optimization](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/09_Combinatorial_Method_Usage_with_FingerPrint_Transformers.ipynb)
* [Using pandas output for easy feature importance analysis and combine pre-exisitng values with new computations](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/10_pipeline_pandas_output.ipynb)
- [Basic Usage and fingerprint transformers](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/01_basic_usage.ipynb)
- [Descriptor transformer](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/02_descriptor_transformer.ipynb)
- [Pipelining with Scikit-Learn classes](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/03_example_pipeline.ipynb)
- [Molecular standardization](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/04_standardizer.ipynb)
- [Sanitizing SMILES input](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/05_smiles_sanitaztion.ipynb)
- [Integrated hyperparameter tuning of Scikit-Learn estimator and Scikit-Mol transformer](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/06_hyperparameter_tuning.ipynb)
- [Using parallel execution to speed up descriptor and fingerprint calculations](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/07_parallel_transforms.ipynb)
- [Using skopt for hyperparameter tuning](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/08_external_library_skopt.ipynb)
- [Testing different fingerprints as part of the hyperparameter optimization](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/09_Combinatorial_Method_Usage_with_FingerPrint_Transformers.ipynb)
- [Using pandas output for easy feature importance analysis and combine pre-exisitng values with new computations](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/10_pipeline_pandas_output.ipynb)

0 comments on commit a10998f

Please sign in to comment.