From 72252fb8cffe9cd2003f84f50f70052d566bd004 Mon Sep 17 00:00:00 2001 From: EBjerrum Date: Fri, 22 Sep 2023 19:04:18 +0200 Subject: [PATCH] Changes to contribution.md, some minor changes to readme's and added jupytext as and extra_requires dep --- CONTRIBUTION.md | 42 ++++++++++++++++++++++++++++++++++-------- README.md | 6 +++++- notebooks/README.md | 11 +---------- setup.cfg | 1 + 4 files changed, 41 insertions(+), 19 deletions(-) diff --git a/CONTRIBUTION.md b/CONTRIBUTION.md index da7a012..2a89c31 100644 --- a/CONTRIBUTION.md +++ b/CONTRIBUTION.md @@ -3,11 +3,17 @@ Thanks for your interest in contributing to the project. Please read on in the sections that apply. +## Slack channel +We have a slack channel for communication, ask for an invite: esbenbjerrum+scikit_mol@gmail.com + + ## Installation Clone and install in editable more like this git clone git@github.com:EBjerrum/scikit-mol.git - pip install -e . + pip install -e .[dev] + +If you get issues that the editable mode install needs a setup.py, you should update your pip ## Adding transformers The projects transformers subclasses the BaseEstimator and Transformer mixin classes from sklearn. Their documentation page contains information on what requisites are necessary [https://scikit-learn.org/stable/developers/develop.html](https://scikit-learn.org/stable/developers/develop.html). Most notably: @@ -16,18 +22,38 @@ The projects transformers subclasses the BaseEstimator and Transformer mixin cla * * There should be no logic, not even input validation, and the parameters should not be changed. Scikit-learn classes depends on this in order to for e.g. the .get_params(), .set_params(), cloning abilities and representation rendering to work. -### Docstrings -We should ultimately consolidate on the numpy docstring format [https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard](https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard) which is also used by SciPy and other scikits. - -### Testing -New transformer classes should be added to the pytest tests in the tests directory. There may be a need for specific tests for the specific transformer, but it should also be added to the general tests that test sklearn necessary aspects of the transformer such as clonability. - ### Tips * We have observed that some external tools used "exotic" types such at np.int64 when doing hyperparameter tuning. It is thus necessary to cast to standard types before making calls to rdkit functions. This behaviour is tested in the test_parameter_types test * @property getters and setters can be used if additional logic are needed when setting the attributes from the keywords while at the same time adhering to the sklearn requisites. -* Some RDKit features uses objects as generators which may not be picklable. If instantiated and added to the object rather than instantiated at each function call for individual molecules, these should thus be removed and recreated via overloading the _get_state() and _set_state() methods. See MHFingerprintTransformer for an example. +* Some RDKit features uses objects as generators which may not be picklable. If instantiated and added to the object as an attribute rather than instantiated at each function call for individual molecules, these should thus be removed and recreated via overloading the _get_state() and _set_state() methods. See MHFingerprintTransformer for an example. + + +## Module organisation +Currently we have multiple classes in the same file, if they are the same type. This may change in the future. + +## Docstrings +We should ultimately consolidate on the NumPy docstring format [https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard](https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard) which is also used by SciPy and other scikits. + +## Typehints +parameters and output of methods should preferably be using typehints + +## Testing +New transformer classes should be added to the pytest tests in the tests directory. A lot of tests are made general, and tests aspects of the transformers that are needed for sklearn compliance or other features. The transformer is then added to a fixture and can be added to the lists of transformer objects that are run by these test. Specific tests may also be necessary to set up. As exampe the assert_transformer_set_params needs a list of non-default parameters in order to set the set_params functionality of the object. + +## Notebooks +Another way of contributing is by providing notebooks with examples on how to use the project to build models together with Scikit-Learn and other tools. There is both .ipynb and #%% delimited .py files in the notebook directory as the first are useful for online rendering on github, whereas the later is useful for sub version control. + +There are three scripts for handling the notebooks and their associated python:percent scripts (with much nicer diff for git). + +`pair_notebook.sh` create a pair of a notebook or percent:py script + +`sync_notebooks.sh` uses jupytext to sync .py and ipynb. Jupytext is available via conda-forge or pip + +`update_notebooks.sh` will sync, run and save the notebooks, expects a ipython kernel with scikit-mol installed called Python3. + + diff --git a/README.md b/README.md index 1a24e9b..4beb9e9 100644 --- a/README.md +++ b/README.md @@ -70,6 +70,7 @@ There are a collection of notebooks in the notebooks directory which demonstrate * [Sanitizing SMILES input](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/05_smiles_sanitaztion.ipynb) * [Integrated hyperparameter tuning of Scikit-Learn estimator and Scikit-Mol transformer](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/06_hyperparameter_tuning.ipynb) * [Using parallel execution to speed up descriptor and fingerprint calculations](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/07_parallel_transforms.ipynb) +* [Testing different fingerprints as part of the hyperparameter optimization](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/09_Combinatorial_Method_Usage_with_FingerPrint_Transformers.ipynb) ## Contributing @@ -79,7 +80,7 @@ There are more information about how to contribute to the project in [CONTRIBUTI Probably still, please check issues at GitHub and report there ## Contributers: -* Esben Jannik Bjerrum [@ebjerrum](https://github.com/ebjerrum), esbenbjerrum+cheminformania@gmail.com +* Esben Jannik Bjerrum [@ebjerrum](https://github.com/ebjerrum), esbenbjerrum+scikit_mol@gmail.com * Carmen Esposito [@cespos](https://github.com/cespos) * Son Ha, sonha@uni-mainz.de * Oh-hyeon Choung, ohhyeon.choung@gmail.com @@ -87,3 +88,6 @@ Probably still, please check issues at GitHub and report there * Ya Chen, [@anya-chen](https://github.com/anya-chen) * RafaƂ Bachorz [@rafalbachorz](https://github.com/rafalbachorz) * Adrien Chaton [@adrienchaton](https://github.com/adrienchaton) +* @VincentAlexanderScholz +* @RiesBen + diff --git a/notebooks/README.md b/notebooks/README.md index 2a983b4..2e3c285 100644 --- a/notebooks/README.md +++ b/notebooks/README.md @@ -11,13 +11,4 @@ This is a collection of notebooks in the notebooks directory which demonstrates * [Sanitizing SMILES input](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/05_smiles_sanitaztion.ipynb) * [Integrated hyperparameter tuning of Scikit-Learn estimator and Scikit-Mol transformer](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/06_hyperparameter_tuning.ipynb) * [Using parallel execution to speed up descriptor and fingerprint calculations](https://github.com/EBjerrum/scikit-mol/tree/main/notebooks/07_parallel_transforms.ipynb) - - -## Developers -There are three scripts for handling the notebooks and their associated python:percent scripts (with much nicer diff for git) - -`pair_notebook.sh` create a pair of a notebook or percent:py script - -`sync_notebooks.sh` uses jupytext to sync .py and ipynb. Jupytext is available via conda-forge or pip - -`update_notebooks.sh` will sync, run and save the notebooks, expects a ipython kernel with scikit-mol installed called Python3. +* [Testing different fingerprints as part of the hyperparameter optimization](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/09_Combinatorial_Method_Usage_with_FingerPrint_Transformers.ipynb) \ No newline at end of file diff --git a/setup.cfg b/setup.cfg index e56f9db..2bf9f11 100644 --- a/setup.cfg +++ b/setup.cfg @@ -43,3 +43,4 @@ exclude = dev = pytest>=6 pytest-cov + jupytext