Skip to content

Commit

Permalink
Added a contribution file
Browse files Browse the repository at this point in the history
  • Loading branch information
EBjerrum committed Sep 22, 2023
1 parent 21229a8 commit 0848f6d
Show file tree
Hide file tree
Showing 3 changed files with 41 additions and 10 deletions.
33 changes: 33 additions & 0 deletions CONTRIBUTION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Contribution

Thanks for your interest in contributing to the project. Please read on in the sections that apply.


## Installation
Clone and install in editable more like this

git clone [email protected]:EBjerrum/scikit-mol.git
pip install -e .

## Adding transformers
The projects transformers subclasses the BaseEstimator and Transformer mixin classes from sklearn. Their documentation page contains information on what requisites are necessary [https://scikit-learn.org/stable/developers/develop.html](https://scikit-learn.org/stable/developers/develop.html). Most notably:
* The arguments accepted by __init__ should all be keyword arguments with a default value.
* Every keyword argument accepted by __init__ should correspond to an attribute on the instance.
* * There should be no logic, not even input validation, and the parameters should not be changed.
Scikit-learn classes depends on this in order to for e.g. the .get_params(), .set_params(), cloning abilities and representation rendering to work.

### Docstrings
We should ultimately consolidate on the numpy docstring format [https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard](https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard) which is also used by SciPy and other scikits.

### Testing
New transformer classes should be added to the pytest tests in the tests directory. There may be a need for specific tests for the specific transformer, but it should also be added to the general tests that test sklearn necessary aspects of the transformer such as clonability.

### Tips
* We have observed that some external tools used "exotic" types such at np.int64 when doing hyperparameter tuning. It is thus necessary to cast to standard types before making calls to rdkit functions. This behaviour is tested in the test_parameter_types test

* @property getters and setters can be used if additional logic are needed when setting the attributes from the keywords while at the same time adhering to the sklearn requisites.

* Some RDKit features uses objects as generators which may not be picklable. If instantiated and added to the object rather than instantiated at each function call for individual molecules, these should thus be removed and recreated via overloading the _get_state() and _set_state() methods. See MHFingerprintTransformer for an example.



8 changes: 3 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,11 +59,6 @@ Bleeding edge

pip install git+https://github.com:EBjerrum/scikit-mol.git

Developers

git clone [email protected]:EBjerrum/scikit-mol.git
pip install -e .

## Documentation

There are a collection of notebooks in the notebooks directory which demonstrates some different aspects and use cases
Expand All @@ -76,6 +71,9 @@ There are a collection of notebooks in the notebooks directory which demonstrate
* [Integrated hyperparameter tuning of Scikit-Learn estimator and Scikit-Mol transformer](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/06_hyperparameter_tuning.ipynb)
* [Using parallel execution to speed up descriptor and fingerprint calculations](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/07_parallel_transforms.ipynb)

## Contributing

There are more information about how to contribute to the project in [CONTRIBUTION.md](https://github.com/EBjerrum/scikit-mol/CONTRIBUTION.md)

## BUGS
Probably still, please check issues at GitHub and report there
Expand Down
10 changes: 5 additions & 5 deletions notebooks/06_hyperparameter_tuning.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@

# %%

mol_list_train, mol_list_test, y_train, y_test = train_test_split(data.ROMol, data.pXC50, random_state=0)
mol_list_train, mol_list_test, y_train, y_test = train_test_split(data.ROMol, data.pXC50, random_state=42)


# %% [markdown]
Expand Down Expand Up @@ -111,10 +111,10 @@
# %%

param_dist = {'ridge__alpha': loguniform(1e-2, 1e3),
"morgantransformer__nBits": [256,512,1024,2048,4096],
'morgantransformer__radius':[1,2,3,4],
'morgantransformer__useCounts': [True,False],
'morgantransformer__useFeatures':[True,False]}
"morganfingerprinttransformer__nBits": [256,512,1024,2048,4096],
'morganfingerprinttransformer__radius':[1,2,3,4],
'morganfingerprinttransformer__useCounts': [True,False],
'morganfingerprinttransformer__useFeatures':[True,False]}

# %% [markdown]
# The report function was taken from [this example](https://scikit-learn.org/stable/auto_examples/model_selection/plot_randomized_search.html#sphx-glr-auto-examples-model-selection-plot-randomized-search-py) from the scikit learn documentation.
Expand Down

0 comments on commit 0848f6d

Please sign in to comment.