Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adjusting to rdkit fp generator changes #54

Merged
merged 26 commits into from
Nov 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
341171e
adapting to new rdkit fingerprint generators.
RiesBen Sep 13, 2024
8be105a
Deprecations warnings in transformers: raise->prints
RiesBen Sep 14, 2024
681a493
minor class property fixes, most test wun now. Need to look into clon…
RiesBen Sep 14, 2024
c812214
fixes for bugs
RiesBen Sep 16, 2024
523b190
updated readmes
Oct 26, 2024
895924f
Merge branch 'main' into 41_adjusting_to_rdkit_fp_generator_changes
RiesBen Nov 14, 2024
fdd8624
Remodelling transformers:
RiesBen Nov 14, 2024
d420cbd
Remodelling transformers:
RiesBen Nov 14, 2024
f7d2958
Remodelling transformers:
RiesBen Nov 15, 2024
5ae6a2b
Remodelling transformers:
RiesBen Nov 15, 2024
bfadd31
Merge pull request #43 from RiesBen/41_adjusting_to_rdkit_fp_generato…
EBjerrum Nov 22, 2024
5f91e0c
Preparing file split
Nov 22, 2024
f7b20f1
Split fingerprint file into smaller for better overview
Nov 22, 2024
f092feb
Refactored the baseclasses for more logical inheritance for the two a…
Nov 22, 2024
ff8cf2e
Updated child classes to honor the safe_inference_mode
Nov 22, 2024
7d9941a
working prototype for the soft transition to generator classes with p…
Nov 22, 2024
d425b72
Minor fixes in baseclasses
Nov 24, 2024
f73558f
Fixed exotic types
Nov 24, 2024
96b5b42
Fixed test to use new baseclass
Nov 24, 2024
b8346f6
Fixed the tests to move from classic to generator for morgan. Not the…
Nov 24, 2024
d9b8eb7
Minor changes to tests
Nov 24, 2024
646808f
Updated test to reflect change to generator subclasses
Nov 24, 2024
0b0f0fc
updated gitignore
Nov 24, 2024
5701192
Fixed some issues that created deprecation warnings. All tests passes…
Nov 24, 2024
35cfdc1
Deleted the generator and classic temporary classes and updated tests
Nov 24, 2024
f1bbc30
Updated and reran notebooks
Nov 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,6 @@ tests/data/

# setuptools_scm version
scikit_mol/_version.py
notebooks/sandbox.py
.vscode
notebooks/SLC6A4_active_excape_export.csv
sandbox/
8 changes: 5 additions & 3 deletions CONTRIBUTION.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Thanks for your interest in contributing to the project. Please read on in the s
## Slack channel

We have a slack channel for communication, ask for an invite: [email protected]
It's not really active and Slack wan't to be paid now. Maybe we can use Discord instead.
It's not really active and Slack wan't to be paid now. Maybe we can use Discord instead as slack is now deleting old threads.

## Installation

Expand All @@ -22,12 +22,13 @@ The projects transformers subclasses the BaseEstimator and Transformer mixin cla

- The arguments accepted by **init** should all be keyword arguments with a default value.
- Every keyword argument accepted by **init** should correspond to an attribute on the instance.
- - There should be no logic, not even input validation, and the parameters should not be changed.
- - There should be no logic, not even input validation, and the parameters should not be changed inside the **init** function.
Scikit-learn classes depends on this in order to for e.g. the .get_params(), .set_params(), cloning abilities and representation rendering to work.
- With the new error handling, falsy objects need to return masked arrays or arrays with np.nan (for float dtype)

### Tips

- We have observed that some external tools used "exotic" types such at np.int64 when doing hyperparameter tuning. It is thus necessary to cast to standard types before making calls to rdkit functions. This behaviour is tested in the test_parameter_types test
- We have observed that some external tools used "exotic" types such at np.int64 when doing hyperparameter tuning. It is thus necessary do defensive programming to cast parameters to standard types before making calls to rdkit functions. This behaviour is tested in the test_parameter_types test

- @property getters and setters can be used if additional logic are needed when setting the attributes from the keywords while at the same time adhering to the sklearn requisites.

Expand All @@ -48,6 +49,7 @@ parameters and output of methods should preferably be using typehints
## Testing

New transformer classes should be added to the pytest tests in the tests directory. A lot of tests are made general, and tests aspects of the transformers that are needed for sklearn compliance or other features. The transformer is then added to a fixture and can be added to the lists of transformer objects that are run by these test. Specific tests may also be necessary to set up. As exampe the assert_transformer_set_params needs a list of non-default parameters in order to set the set_params functionality of the object.
Scikit-Learn has a check_estimator that we should strive to get to work, some classes of scikit-mol currently does not pass all tests.

## Notebooks

Expand Down
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,12 @@ There are a collection of notebooks in the notebooks directory which demonstrate

We also put a software note on ChemRxiv. [https://doi.org/10.26434/chemrxiv-2023-fzqwd](https://doi.org/10.26434/chemrxiv-2023-fzqwd)

## Contributing
## Roadmap and Contributing

_Help wanted!_ Are you a PhD student that want a "side-quest" to procrastinate your thesis writing or are you simply interested in computational chemistry, cheminformatics or simply with an interest in QSAR modelling, Python Programming open-source software? Do you want to learn more about machine learning with Scikit-Learn? Or do you use scikit-mol for your current work and would like to pay a little back to the project and see it improved as well?
With a little bit of help, this project can be improved much faster! Reach to me (Esben), for a discussion about how we can proceed.

Currently we are working on fixing some deprecation warnings, its not the most exciting work, but it's important to maintain a little. Later on we need to go over the scikit-learn compatibility and update to some of their newer features on their estimator classes. We're also brewing on some feature enhancements and tests, such as new fingerprints and a more versatile standardizer.

There are more information about how to contribute to the project in [CONTRIBUTION.md](https://github.com/EBjerrum/scikit-mol/CONTRIBUTION.md)

Expand Down
564 changes: 486 additions & 78 deletions notebooks/01_basic_usage.ipynb

Large diffs are not rendered by default.

19 changes: 11 additions & 8 deletions notebooks/01_basic_usage.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,14 @@
# %%
from rdkit import Chem

smiles_strings = ["C12C([C@@H](OC(C=3C=CC(=CC3)F)C=4C=CC(=CC4)F)CC(N1CCCCCC5=CC=CC=C5)CC2)C(=O)OC",
"O(C1=NC=C2C(CN(CC2=C1)C)C3=CC=C(OC)C=C3)CCCN(CC)CC",
"O=S(=O)(N(CC=1C=CC2=CC=CC=C2C1)[C@@H]3CCNC3)C",
"C1(=C2C(CCCC2O)=NC=3C1=CC=CC3)NCC=4C=CC(=CC4)Cl",
"C1NC[C@@H](C1)[C@H](OC=2C=CC(=NC2C)OC)CC(C)C",
"FC(F)(F)C=1C(CN(C2CCNCC2)CC(CC)CC)=CC=CC1"]
smiles_strings = [
"C12C([C@@H](OC(C=3C=CC(=CC3)F)C=4C=CC(=CC4)F)CC(N1CCCCCC5=CC=CC=C5)CC2)C(=O)OC",
"O(C1=NC=C2C(CN(CC2=C1)C)C3=CC=C(OC)C=C3)CCCN(CC)CC",
"O=S(=O)(N(CC=1C=CC2=CC=CC=C2C1)[C@@H]3CCNC3)C",
"C1(=C2C(CCCC2O)=NC=3C1=CC=CC3)NCC=4C=CC(=CC4)Cl",
"C1NC[C@@H](C1)[C@H](OC=2C=CC(=NC2C)OC)CC(C)C",
"FC(F)(F)C=1C(CN(C2CCNCC2)CC(CC)CC)=CC=CC1",
]

mols = [Chem.MolFromSmiles(smiles) for smiles in smiles_strings]

Expand All @@ -48,7 +50,7 @@
print(transformer)

# %% [markdown]
# It actually renders as a cute little interactive block in the Jupyter notebook and lists the options that are not the default values. If we print it, it also gives the information on the settings.
# It actually renders as a cute little interactive block in the Jupyter notebook and lists the options that are not the default values. If we print it, it also gives the information on the settings.
#
# ![An image of the interactive transformer widget](images/Transformer_Widget.jpg "Transformer object rendering in Jupyter")
#
Expand All @@ -69,7 +71,7 @@

# %%
parameters["radius"] = 2
parameters["nBits"] = 256
parameters["fpSize"] = 256
transformer.set_params(**parameters)
print(transformer)

Expand All @@ -92,6 +94,7 @@

# %%
from scikit_mol.conversions import SmilesToMolTransformer

smi2mol = SmilesToMolTransformer()
print(smi2mol)

Expand Down
78 changes: 45 additions & 33 deletions notebooks/02_descriptor_transformer.ipynb

Large diffs are not rendered by default.

Loading