-
Notifications
You must be signed in to change notification settings - Fork 21
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Updated README and cleaned up some files
- Loading branch information
EBjerrum
committed
Oct 16, 2022
1 parent
0187a27
commit ebb19e2
Showing
4 changed files
with
60 additions
and
112 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,42 +1,74 @@ | ||
# scikit-mol | ||
scikit-learn classes for molecular vectorization using RDKit | ||
|
||
Scikit-Learn classes for molecular vectorization using RDKit | ||
|
||
TODO: | ||
Expand number of fingerprint classes and featurizers | ||
AtomPairs | ||
TopologicalTorsions | ||
RDKit | ||
Descriptors | ||
LINGOS | ||
... | ||
The intended usage is to be able to add molecular vectorization directly into scikit-learn pipelines, so that the final model directly predict on RDKit molecules or SMILES strings | ||
|
||
Make dictionary based FP class | ||
No Hashing, .fit() learns the keys of the dataset | ||
As example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and _test lists: | ||
|
||
Make a basic standardarizer transformer class | ||
pipe = Pipeline([('mol_transformer', MorganTransformer()), ('Regressor', Ridge())]) | ||
pipe.fit(mol_list_train, y_train) | ||
pipe.score(mol_list_test, y_test) | ||
pipe.predict([Chem.MolFromSmiles('c1ccccc1C(=O)C')]) | ||
|
||
Make a SMILES to Mol transformer class | ||
>>> array([4.93858815]) | ||
|
||
Make Notebook with examples | ||
Standalone usage | ||
Inclusion in pipeline | ||
Can transformers be used in parallel (e.g. to use both FP features and Descriptors at the same time?) | ||
Hyperparameter optimization via native Scikit-Classes | ||
Hyperparameter optimization via external optimizer e.g. https://scikit-optimize.github.io/stable/ | ||
The scikit-learn compatibility should also make it easier to include the fingerprinting step in hyperparameter tuning with scikit-learns utilities | ||
|
||
The first draft for the project was created at the [RDKIT UGM 2022 hackathon](https://github.com/rdkit/UGM_2022) 2022-October-14 | ||
|
||
Make basic unit-tests | ||
|
||
## Implemented | ||
* Transformer Classes | ||
* SmilesToMol | ||
* Desc2DTransformer | ||
* MACCSTransformer | ||
* RDKitFPTransformer | ||
* AtomPairFingerprintTransformer | ||
* TopologicalTorsionFingerprintTransformer | ||
* MorganTransformer | ||
<br> | ||
<br> | ||
* Utilities | ||
* CheckSmilesSanitazion | ||
|
||
## Installation | ||
Users can install latest tagged release from pip | ||
|
||
pip install scikit-mol | ||
|
||
Bleeding edge | ||
|
||
pip install git+https://github.com:EBjerrum/scikit-mol.git | ||
|
||
Developers | ||
|
||
Installation | ||
git clone [email protected]:EBjerrum/scikit-mol.git | ||
pip install -e . | ||
|
||
## Documentation | ||
None yet, but there are some # %% delimted examples in the notebooks directory that have some demonstrations | ||
|
||
## BUGS | ||
Probably still | ||
|
||
|
||
## TODO | ||
* Make standardizer less 'chatty' | ||
* Unit test coverage of classes | ||
* Make further example notebooks | ||
* Standalone usage (not in pipeline) | ||
* Advanced pipelining | ||
* Hyperparameter optimization via external optimizer e.g. https://scikit-optimize.github.io/stable/ | ||
|
||
## Ideas | ||
* LINGOS transformer | ||
|
||
|
||
Contributers: | ||
Esben Bjerrum, [email protected] | ||
Son Ha, [email protected] | ||
Oh-hyeon Choung, [email protected] | ||
Please add yourself here, we'll properly markdown it later | ||
## Contributers: | ||
* Esben Bjerrum, [email protected] | ||
* Carmen Esposito https://github.com/cespos | ||
* Son Ha, [email protected] | ||
* Oh-hyeon Choung, [email protected] | ||
* Andreas Poehlmann, https://github.com/ap-- | ||
* Ya Chen, https://github.com/anya-chen |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.