Skip to content

Commit

Permalink
Merge branch 'main' into 41_adjusting_to_rdkit_fp_generator_changes
Browse files Browse the repository at this point in the history
  • Loading branch information
RiesBen authored Nov 14, 2024
2 parents c812214 + f556652 commit 895924f
Show file tree
Hide file tree
Showing 22 changed files with 3,003 additions and 529 deletions.
16 changes: 7 additions & 9 deletions .github/workflows/run_pytests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,38 +25,36 @@ jobs:
python-version: 3.9
- os: ubuntu-latest
python-version: 3.8
- os: ubuntu-latest
python-version: 3.7
steps:
- name: Checkout scikit_mol
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install scikit_mol
run: python -m pip install -e .[dev]
- name: Cache tests/data
uses: actions/cache@v3
uses: actions/cache@v4
with:
path: tests/data
key: ${{ runner.os }}-${{ hashFiles('tests/conftest.py') }}
- name: Run Tests
run: pytest --cov=./scikit_mol .

# deploy scikit_mol to pypi for tagged commits if tests pass
# deploy scikit_mol to pypi for tagged commits if tests pass
deploy:
needs: [ tests ]
needs: [tests]
name: deploy to pypi
runs-on: ubuntu-latest
# this checks that the commit is a tagged commit
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags')
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: 3.8
- name: Install dependencies
Expand Down
38 changes: 27 additions & 11 deletions CONTRIBUTION.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@

Thanks for your interest in contributing to the project. Please read on in the sections that apply.


## Slack channel
We have a slack channel for communication, ask for an invite: [email protected]

We have a slack channel for communication, ask for an invite: [email protected]
It's not really active and Slack wan't to be paid now. Maybe we can use Discord instead.

## Installation

Clone and install in editable more like this

git clone [email protected]:EBjerrum/scikit-mol.git
Expand All @@ -16,33 +17,40 @@ Clone and install in editable more like this
If you get issues that the editable mode install needs a setup.py, you should update your pip

## Adding transformers

The projects transformers subclasses the BaseEstimator and Transformer mixin classes from sklearn. Their documentation page contains information on what requisites are necessary [https://scikit-learn.org/stable/developers/develop.html](https://scikit-learn.org/stable/developers/develop.html). Most notably:
* The arguments accepted by __init__ should all be keyword arguments with a default value.
* Every keyword argument accepted by __init__ should correspond to an attribute on the instance.
* * There should be no logic, not even input validation, and the parameters should not be changed.
Scikit-learn classes depends on this in order to for e.g. the .get_params(), .set_params(), cloning abilities and representation rendering to work.

- The arguments accepted by **init** should all be keyword arguments with a default value.
- Every keyword argument accepted by **init** should correspond to an attribute on the instance.
- - There should be no logic, not even input validation, and the parameters should not be changed.
Scikit-learn classes depends on this in order to for e.g. the .get_params(), .set_params(), cloning abilities and representation rendering to work.

### Tips
* We have observed that some external tools used "exotic" types such at np.int64 when doing hyperparameter tuning. It is thus necessary to cast to standard types before making calls to rdkit functions. This behaviour is tested in the test_parameter_types test

* @property getters and setters can be used if additional logic are needed when setting the attributes from the keywords while at the same time adhering to the sklearn requisites.
- We have observed that some external tools used "exotic" types such at np.int64 when doing hyperparameter tuning. It is thus necessary to cast to standard types before making calls to rdkit functions. This behaviour is tested in the test_parameter_types test

* Some RDKit features uses objects as generators which may not be picklable. If instantiated and added to the object as an attribute rather than instantiated at each function call for individual molecules, these should thus be removed and recreated via overloading the _get_state() and _set_state() methods. See MHFingerprintTransformer for an example.
- @property getters and setters can be used if additional logic are needed when setting the attributes from the keywords while at the same time adhering to the sklearn requisites.

- Some RDKit features uses objects as generators which may not be picklable. If instantiated and added to the object as an attribute rather than instantiated at each function call for individual molecules, these should thus be removed and recreated via overloading the \_get_state() and \_set_state() methods. See MHFingerprintTransformer for an example.

## Module organisation

Currently we have multiple classes in the same file, if they are the same type. This may change in the future.

## Docstrings

We should ultimately consolidate on the NumPy docstring format [https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard](https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard) which is also used by SciPy and other scikits.

## Typehints

parameters and output of methods should preferably be using typehints

## Testing

New transformer classes should be added to the pytest tests in the tests directory. A lot of tests are made general, and tests aspects of the transformers that are needed for sklearn compliance or other features. The transformer is then added to a fixture and can be added to the lists of transformer objects that are run by these test. Specific tests may also be necessary to set up. As exampe the assert_transformer_set_params needs a list of non-default parameters in order to set the set_params functionality of the object.

## Notebooks

Another way of contributing is by providing notebooks with examples on how to use the project to build models together with Scikit-Learn and other tools. There is both .ipynb and #%% delimited .py files in the notebook directory as the first are useful for online rendering on github, whereas the later is useful for sub version control.

There are three scripts for handling the notebooks and their associated python:percent scripts (with much nicer diff for git).
Expand All @@ -51,9 +59,17 @@ There are three scripts for handling the notebooks and their associated python:p

`sync_notebooks.sh` uses jupytext to sync .py and ipynb. Jupytext is available via conda-forge or pip

`update_notebooks.sh` will sync, run and save the notebooks, expects a ipython kernel with scikit-mol installed called Python3.

`update_notebooks.sh` will sync, run and save the notebooks, expects a ipython kernel with scikit-mol installed called Python3.

## Release

_PyPi_
adding a new tag will trigger a release on pypi. As example:

```bash
git tag v0.4.1
git push origin tag v0.4.0
```

_Conda_
When you make a release on pypi the conda-forge bot will automatically make a PR that updates the Conda feedstock to the new version. If new dependencies or pins are changed on dependencies, those changes will need to be added to the PR. If there is just a pure code change then all we have do to is merge in the PR and that will update the package on conda-forge. See https://conda-forge.org/docs/maintainer/updating_pkgs/ for more information
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,11 @@ The first draft for the project was created at the [RDKIT UGM 2022 hackathon](ht
- Standardizer
- Standardizer

<br>
- safeinference
- SafeInferenceWrapper
- set_safe_inference_mode

<br>

- Utilities
Expand All @@ -58,6 +63,12 @@ Users can install latest tagged release from pip

pip install scikit-mol

or from conda-forge

conda install -c conda-forge scikit-mol

The conda forge package should get updated shortly after a new tagged release on pypi.

Bleeding edge

pip install git+https://github.com:EBjerrum/scikit-mol.git
Expand All @@ -76,6 +87,7 @@ There are a collection of notebooks in the notebooks directory which demonstrate
- [Using skopt for hyperparameter tuning](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/08_external_library_skopt.ipynb)
- [Testing different fingerprints as part of the hyperparameter optimization](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/09_Combinatorial_Method_Usage_with_FingerPrint_Transformers.ipynb)
- [Using pandas output for easy feature importance analysis and combine pre-exisitng values with new computations](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/10_pipeline_pandas_output.ipynb)
- [Working with pipelines and estimators in safe inference mode for handling prediction on batches with invalid smiles or molecules](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/11_safe_inference.ipynb)

We also put a software note on ChemRxiv. [https://doi.org/10.26434/chemrxiv-2023-fzqwd](https://doi.org/10.26434/chemrxiv-2023-fzqwd)

Expand All @@ -100,3 +112,5 @@ Probably still, please check issues at GitHub and report there
- [@VincentAlexanderScholz](https://github.com/VincentAlexanderScholz)
- [@RiesBen](https://github.com/RiesBen)
- [@enricogandini](https://github.com/enricogandini)
- [@mikemhenry](https://github.com/mikemhenry)
- [@c-feldmann](https://github.com/c-feldmann)
70 changes: 59 additions & 11 deletions notebooks/04_standardizer.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
{
"data": {
"text/plain": [
"array([<rdkit.Chem.rdchem.Mol object at 0x7da9b28bb060>], dtype=object)"
"array([<rdkit.Chem.rdchem.Mol object at 0x76d82c8f9fc0>], dtype=object)"
]
},
"metadata": {},
Expand All @@ -64,7 +64,7 @@
{
"data": {
"text/plain": [
"array([<rdkit.Chem.rdchem.Mol object at 0x7da9b28bb0d0>], dtype=object)"
"array([<rdkit.Chem.rdchem.Mol object at 0x76d82c8fa030>], dtype=object)"
]
},
"metadata": {},
Expand Down Expand Up @@ -121,7 +121,7 @@
],
"source": [
"# You can just run straight up like this. Note that neutralising is optional\n",
"standardizer = Standardizer(neutralize=True)\n",
"standardizer = Standardizer()\n",
"standard_mols = standardizer.transform(mols)\n",
"standard_smiles = smi2mol.inverse_transform(standard_mols)\n",
"standard_smiles"
Expand Down Expand Up @@ -155,8 +155,32 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Predictions with no standardization: [0.51983795 0.51983795 2.06562022 3.01206795 3.95446692 4.92816899]\n",
"Predictions with standardization: [0.51983795 0.61543701 2.31738354 3.01206795 3.44085399 4.37516731]\n"
"Predictions with no standardization: [0.51983795 0.61543701 2.31738354 3.01206795 3.44085399 4.37516731]\n",
"Predictions with standardization: [0.51983795 0.51983795 2.06562022 3.01206795 3.95446692 4.92816899]\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n"
]
}
],
Expand All @@ -172,8 +196,8 @@
"std_pipe.fit(smiles_strings, fake_y)\n",
"\n",
"\n",
"print(f'Predictions with no standardization: {std_pipe.predict(smiles_strings)}')\n",
"print(f'Predictions with standardization: {nonstd_pipe.predict(smiles_strings)}')"
"print(f'Predictions with no standardization: {nonstd_pipe.predict(smiles_strings)}')\n",
"print(f'Predictions with standardization: {std_pipe.predict(smiles_strings)}')"
]
},
{
Expand Down Expand Up @@ -205,15 +229,39 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Predictions with no standardization: [0.07445775 0.07445775 2.32132164 3.00857908 2.68502208 4.30275549]\n",
"Predictions with standardization: [0.07445775 0.96053374 2.05993278 3.00857908 3.96365443 4.93284221]\n"
"Predictions with no standardization: [0.07445775 0.96053374 2.05993278 3.00857908 3.96365443 4.93284221]\n",
"Predictions with standardization: [0.07445775 0.07445775 2.32132164 3.00857908 2.68502208 4.30275549]\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n",
"[19:07:06] DEPRECATION WARNING: please use MorganGenerator\n"
]
}
],
"source": [
"nonstd_pipe.fit(smiles_strings, fake_y)\n",
"print(f'Predictions with no standardization: {std_pipe.predict(smiles_strings)}')\n",
"print(f'Predictions with standardization: {nonstd_pipe.predict(smiles_strings)}')"
"print(f'Predictions with no standardization: {nonstd_pipe.predict(smiles_strings)}')\n",
"print(f'Predictions with standardization: {std_pipe.predict(smiles_strings)}')"
]
}
],
Expand Down
10 changes: 5 additions & 5 deletions notebooks/04_standardizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@

# %%
# You can just run straight up like this. Note that neutralising is optional
standardizer = Standardizer(neutralize=True)
standardizer = Standardizer()
standard_mols = standardizer.transform(mols)
standard_smiles = smi2mol.inverse_transform(standard_mols)
standard_smiles
Expand All @@ -66,8 +66,8 @@
std_pipe.fit(smiles_strings, fake_y)


print(f'Predictions with no standardization: {std_pipe.predict(smiles_strings)}')
print(f'Predictions with standardization: {nonstd_pipe.predict(smiles_strings)}')
print(f'Predictions with no standardization: {nonstd_pipe.predict(smiles_strings)}')
print(f'Predictions with standardization: {std_pipe.predict(smiles_strings)}')


# %% [markdown]
Expand All @@ -79,5 +79,5 @@

# %%
nonstd_pipe.fit(smiles_strings, fake_y)
print(f'Predictions with no standardization: {std_pipe.predict(smiles_strings)}')
print(f'Predictions with standardization: {nonstd_pipe.predict(smiles_strings)}')
print(f'Predictions with no standardization: {nonstd_pipe.predict(smiles_strings)}')
print(f'Predictions with standardization: {std_pipe.predict(smiles_strings)}')
Loading

0 comments on commit 895924f

Please sign in to comment.