Skip to content

Commit

Permalink
Updated README and cleaned up some files
Browse files Browse the repository at this point in the history
  • Loading branch information
EBjerrum committed Oct 16, 2022
1 parent 0187a27 commit ebb19e2
Show file tree
Hide file tree
Showing 4 changed files with 60 additions and 112 deletions.
84 changes: 58 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,74 @@
# scikit-mol
scikit-learn classes for molecular vectorization using RDKit

Scikit-Learn classes for molecular vectorization using RDKit

TODO:
Expand number of fingerprint classes and featurizers
AtomPairs
TopologicalTorsions
RDKit
Descriptors
LINGOS
...
The intended usage is to be able to add molecular vectorization directly into scikit-learn pipelines, so that the final model directly predict on RDKit molecules or SMILES strings

Make dictionary based FP class
No Hashing, .fit() learns the keys of the dataset
As example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and _test lists:

Make a basic standardarizer transformer class
pipe = Pipeline([('mol_transformer', MorganTransformer()), ('Regressor', Ridge())])
pipe.fit(mol_list_train, y_train)
pipe.score(mol_list_test, y_test)
pipe.predict([Chem.MolFromSmiles('c1ccccc1C(=O)C')])

Make a SMILES to Mol transformer class
>>> array([4.93858815])

Make Notebook with examples
Standalone usage
Inclusion in pipeline
Can transformers be used in parallel (e.g. to use both FP features and Descriptors at the same time?)
Hyperparameter optimization via native Scikit-Classes
Hyperparameter optimization via external optimizer e.g. https://scikit-optimize.github.io/stable/
The scikit-learn compatibility should also make it easier to include the fingerprinting step in hyperparameter tuning with scikit-learns utilities

The first draft for the project was created at the [RDKIT UGM 2022 hackathon](https://github.com/rdkit/UGM_2022) 2022-October-14

Make basic unit-tests

## Implemented
* Transformer Classes
* SmilesToMol
* Desc2DTransformer
* MACCSTransformer
* RDKitFPTransformer
* AtomPairFingerprintTransformer
* TopologicalTorsionFingerprintTransformer
* MorganTransformer
<br>
<br>
* Utilities
* CheckSmilesSanitazion

## Installation
Users can install latest tagged release from pip

pip install scikit-mol

Bleeding edge

pip install git+https://github.com:EBjerrum/scikit-mol.git

Developers

Installation
git clone [email protected]:EBjerrum/scikit-mol.git
pip install -e .

## Documentation
None yet, but there are some # %% delimted examples in the notebooks directory that have some demonstrations

## BUGS
Probably still


## TODO
* Make standardizer less 'chatty'
* Unit test coverage of classes
* Make further example notebooks
* Standalone usage (not in pipeline)
* Advanced pipelining
* Hyperparameter optimization via external optimizer e.g. https://scikit-optimize.github.io/stable/

## Ideas
* LINGOS transformer


Contributers:
Esben Bjerrum, [email protected]
Son Ha, [email protected]
Oh-hyeon Choung, [email protected]
Please add yourself here, we'll properly markdown it later
## Contributers:
* Esben Bjerrum, [email protected]
* Carmen Esposito https://github.com/cespos
* Son Ha, [email protected]
* Oh-hyeon Choung, [email protected]
* Andreas Poehlmann, https://github.com/ap--
* Ya Chen, https://github.com/anya-chen
29 changes: 2 additions & 27 deletions notebooks/sandbox.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@


#%%
from scikit_mol.smilestomol import SmilesToMol
from scikit_mol.transformers import SmilesToMol
smiles_list = ['c1ccccc1'] * 10
smilestomol = SmilesToMol()
mols = smilestomol.fit_transform(smiles_list)
mols[0]


#%%
from scikit_mol.smilestomol import SmilesToMol
from scikit_mol.transformers import SmilesToMol
smiles_list = ['c1ccccc1'] * 10
y = list(range(10))
y.append(1000)
Expand Down Expand Up @@ -42,31 +42,6 @@
mols[0]



#%%
y_out = []
X_out = []
y_error = []
X_error = []

for smiles, y_value in zip(smiles_list, y):
mol = Chem.MolFromSmiles(smiles)
if mol:
X_out.append(mol)
y_out.append(y_value)
else:
print(f'Logging: Error in parsing {smiles}')
X_error.append(smiles)
y_error.append(y_value)

print(X_out)
print(y_out)
print(X_error)
print(y_error)




#%%
X= [Chem.MolFromSmiles('c1ccccc1')]*10
t = MorganTransformer(useCounts=True)
Expand Down
27 changes: 0 additions & 27 deletions scikit_mol/smilestomol.py

This file was deleted.

32 changes: 0 additions & 32 deletions standardizer.py

This file was deleted.

0 comments on commit ebb19e2

Please sign in to comment.