Skip to content

Commit

Permalink
Merge branch 'pandas_output' into pandas_output
Browse files Browse the repository at this point in the history
  • Loading branch information
enricogandini authored Mar 19, 2024
2 parents cc17037 + 2973f70 commit 42673ff
Show file tree
Hide file tree
Showing 7 changed files with 535 additions and 252 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ There are a collection of notebooks in the notebooks directory which demonstrate
* [Integrated hyperparameter tuning of Scikit-Learn estimator and Scikit-Mol transformer](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/06_hyperparameter_tuning.ipynb)
* [Using parallel execution to speed up descriptor and fingerprint calculations](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/07_parallel_transforms.ipynb)
* [Testing different fingerprints as part of the hyperparameter optimization](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/09_Combinatorial_Method_Usage_with_FingerPrint_Transformers.ipynb)
* [Using pandas output for easy feature importance analysis and combine pre-exisitng values with new computations](https://github.com/EBjerrum/scikit-mol/blob/main/notebooks/10_pipeline_pandas_output.ipynb)


We also put a software note on ChemRxiv. [https://doi.org/10.26434/chemrxiv-2023-fzqwd](https://doi.org/10.26434/chemrxiv-2023-fzqwd)
Expand All @@ -93,4 +94,4 @@ Probably still, please check issues at GitHub and report there
* Adrien Chaton [@adrienchaton](https://github.com/adrienchaton)
* [@VincentAlexanderScholz](https://github.com/VincentAlexanderScholz)
* [@RiesBen](https://github.com/RiesBen)

* [@enricogandini](https://github.com/enricogandini)
157 changes: 82 additions & 75 deletions notebooks/06_hyperparameter_tuning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@
"id": "7df4793c",
"metadata": {
"execution": {
"iopub.execute_input": "2023-03-19T08:54:35.701525Z",
"iopub.status.busy": "2023-03-19T08:54:35.700822Z",
"iopub.status.idle": "2023-03-19T08:54:37.014139Z",
"shell.execute_reply": "2023-03-19T08:54:37.013259Z"
"iopub.execute_input": "2024-03-18T13:17:06.835155Z",
"iopub.status.busy": "2024-03-18T13:17:06.834550Z",
"iopub.status.idle": "2024-03-18T13:17:07.938779Z",
"shell.execute_reply": "2024-03-18T13:17:07.938102Z"
}
},
"outputs": [],
Expand Down Expand Up @@ -53,10 +53,10 @@
"id": "45a8ebf1",
"metadata": {
"execution": {
"iopub.execute_input": "2023-03-19T08:54:37.017824Z",
"iopub.status.busy": "2023-03-19T08:54:37.017350Z",
"iopub.status.idle": "2023-03-19T08:54:37.022818Z",
"shell.execute_reply": "2023-03-19T08:54:37.022080Z"
"iopub.execute_input": "2024-03-18T13:17:07.941632Z",
"iopub.status.busy": "2024-03-18T13:17:07.941344Z",
"iopub.status.idle": "2024-03-18T13:17:07.944976Z",
"shell.execute_reply": "2024-03-18T13:17:07.944438Z"
}
},
"outputs": [],
Expand Down Expand Up @@ -87,10 +87,10 @@
"id": "08c233a7",
"metadata": {
"execution": {
"iopub.execute_input": "2023-03-19T08:54:37.025520Z",
"iopub.status.busy": "2023-03-19T08:54:37.025310Z",
"iopub.status.idle": "2023-03-19T08:54:37.077231Z",
"shell.execute_reply": "2023-03-19T08:54:37.076469Z"
"iopub.execute_input": "2024-03-18T13:17:07.947279Z",
"iopub.status.busy": "2024-03-18T13:17:07.947092Z",
"iopub.status.idle": "2024-03-18T13:17:07.986572Z",
"shell.execute_reply": "2024-03-18T13:17:07.985938Z"
}
},
"outputs": [
Expand Down Expand Up @@ -124,17 +124,17 @@
"id": "5363d05a",
"metadata": {
"execution": {
"iopub.execute_input": "2023-03-19T08:54:37.080021Z",
"iopub.status.busy": "2023-03-19T08:54:37.079756Z",
"iopub.status.idle": "2023-03-19T08:54:37.084574Z",
"shell.execute_reply": "2023-03-19T08:54:37.084021Z"
"iopub.execute_input": "2024-03-18T13:17:07.989052Z",
"iopub.status.busy": "2024-03-18T13:17:07.988851Z",
"iopub.status.idle": "2024-03-18T13:17:07.993705Z",
"shell.execute_reply": "2024-03-18T13:17:07.993138Z"
},
"lines_to_next_cell": 2
},
"outputs": [],
"source": [
"\n",
"mol_list_train, mol_list_test, y_train, y_test = train_test_split(data.ROMol, data.pXC50, random_state=0)"
"mol_list_train, mol_list_test, y_train, y_test = train_test_split(data.ROMol, data.pXC50, random_state=42)"
]
},
{
Expand All @@ -151,10 +151,10 @@
"id": "885daf12",
"metadata": {
"execution": {
"iopub.execute_input": "2023-03-19T08:54:37.087530Z",
"iopub.status.busy": "2023-03-19T08:54:37.087275Z",
"iopub.status.idle": "2023-03-19T08:54:37.454935Z",
"shell.execute_reply": "2023-03-19T08:54:37.454153Z"
"iopub.execute_input": "2024-03-18T13:17:07.996298Z",
"iopub.status.busy": "2024-03-18T13:17:07.995898Z",
"iopub.status.idle": "2024-03-18T13:17:08.361248Z",
"shell.execute_reply": "2024-03-18T13:17:08.360550Z"
}
},
"outputs": [],
Expand Down Expand Up @@ -182,10 +182,10 @@
"id": "8fd14250",
"metadata": {
"execution": {
"iopub.execute_input": "2023-03-19T08:54:37.458340Z",
"iopub.status.busy": "2023-03-19T08:54:37.458092Z",
"iopub.status.idle": "2023-03-19T08:54:37.461663Z",
"shell.execute_reply": "2023-03-19T08:54:37.461052Z"
"iopub.execute_input": "2024-03-18T13:17:08.364585Z",
"iopub.status.busy": "2024-03-18T13:17:08.364029Z",
"iopub.status.idle": "2024-03-18T13:17:08.367535Z",
"shell.execute_reply": "2024-03-18T13:17:08.366933Z"
}
},
"outputs": [],
Expand All @@ -212,17 +212,18 @@
"id": "fa082078",
"metadata": {
"execution": {
"iopub.execute_input": "2023-03-19T08:54:37.464188Z",
"iopub.status.busy": "2023-03-19T08:54:37.463905Z",
"iopub.status.idle": "2023-03-19T08:54:37.467187Z",
"shell.execute_reply": "2023-03-19T08:54:37.466548Z"
"iopub.execute_input": "2024-03-18T13:17:08.369937Z",
"iopub.status.busy": "2024-03-18T13:17:08.369731Z",
"iopub.status.idle": "2024-03-18T13:17:08.372598Z",
"shell.execute_reply": "2024-03-18T13:17:08.372103Z"
},
"title": "Now hyperparameter tuning"
},
"outputs": [],
"source": [
"from sklearn.model_selection import RandomizedSearchCV\n",
"from sklearn.utils.fixes import loguniform"
"#from sklearn.utils.fixes import loguniform\n",
"from scipy.stats import loguniform"
]
},
{
Expand All @@ -239,18 +240,18 @@
"id": "046e24d3",
"metadata": {
"execution": {
"iopub.execute_input": "2023-03-19T08:54:37.470133Z",
"iopub.status.busy": "2023-03-19T08:54:37.469879Z",
"iopub.status.idle": "2023-03-19T08:54:37.476370Z",
"shell.execute_reply": "2023-03-19T08:54:37.475763Z"
"iopub.execute_input": "2024-03-18T13:17:08.374931Z",
"iopub.status.busy": "2024-03-18T13:17:08.374724Z",
"iopub.status.idle": "2024-03-18T13:17:08.380663Z",
"shell.execute_reply": "2024-03-18T13:17:08.380185Z"
},
"title": "Which keys do we have?"
},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['memory', 'steps', 'verbose', 'morgantransformer', 'ridge', 'morgantransformer__nBits', 'morgantransformer__parallel', 'morgantransformer__radius', 'morgantransformer__useBondTypes', 'morgantransformer__useChirality', 'morgantransformer__useCounts', 'morgantransformer__useFeatures', 'ridge__alpha', 'ridge__copy_X', 'ridge__fit_intercept', 'ridge__max_iter', 'ridge__positive', 'ridge__random_state', 'ridge__solver', 'ridge__tol'])"
"dict_keys(['memory', 'steps', 'verbose', 'morganfingerprinttransformer', 'ridge', 'morganfingerprinttransformer__nBits', 'morganfingerprinttransformer__parallel', 'morganfingerprinttransformer__radius', 'morganfingerprinttransformer__useBondTypes', 'morganfingerprinttransformer__useChirality', 'morganfingerprinttransformer__useCounts', 'morganfingerprinttransformer__useFeatures', 'ridge__alpha', 'ridge__copy_X', 'ridge__fit_intercept', 'ridge__max_iter', 'ridge__positive', 'ridge__random_state', 'ridge__solver', 'ridge__tol'])"
]
},
"execution_count": 8,
Expand All @@ -277,21 +278,21 @@
"id": "cf2c45d7",
"metadata": {
"execution": {
"iopub.execute_input": "2023-03-19T08:54:37.478959Z",
"iopub.status.busy": "2023-03-19T08:54:37.478683Z",
"iopub.status.idle": "2023-03-19T08:54:37.483305Z",
"shell.execute_reply": "2023-03-19T08:54:37.482688Z"
"iopub.execute_input": "2024-03-18T13:17:08.383218Z",
"iopub.status.busy": "2024-03-18T13:17:08.382844Z",
"iopub.status.idle": "2024-03-18T13:17:08.387097Z",
"shell.execute_reply": "2024-03-18T13:17:08.386532Z"
},
"lines_to_next_cell": 1
},
"outputs": [],
"source": [
"\n",
"param_dist = {'ridge__alpha': loguniform(1e-2, 1e3),\n",
" \"morgantransformer__nBits\": [256,512,1024,2048,4096],\n",
" 'morgantransformer__radius':[1,2,3,4],\n",
" 'morgantransformer__useCounts': [True,False],\n",
" 'morgantransformer__useFeatures':[True,False]}"
" \"morganfingerprinttransformer__nBits\": [256,512,1024,2048,4096],\n",
" 'morganfingerprinttransformer__radius':[1,2,3,4],\n",
" 'morganfingerprinttransformer__useCounts': [True,False],\n",
" 'morganfingerprinttransformer__useFeatures':[True,False]}"
]
},
{
Expand All @@ -308,10 +309,10 @@
"id": "fbb2cacd",
"metadata": {
"execution": {
"iopub.execute_input": "2023-03-19T08:54:37.485906Z",
"iopub.status.busy": "2023-03-19T08:54:37.485645Z",
"iopub.status.idle": "2023-03-19T08:54:37.489973Z",
"shell.execute_reply": "2023-03-19T08:54:37.489382Z"
"iopub.execute_input": "2024-03-18T13:17:08.389557Z",
"iopub.status.busy": "2024-03-18T13:17:08.389341Z",
"iopub.status.idle": "2024-03-18T13:17:08.393214Z",
"shell.execute_reply": "2024-03-18T13:17:08.392741Z"
},
"title": "From https://scikit-learn.org/stable/auto_examples/model_selection/plot_randomized_search.html#sphx-glr-auto-examples-model-selection-plot-randomized-search-py"
},
Expand Down Expand Up @@ -347,18 +348,18 @@
"id": "bc66efa3",
"metadata": {
"execution": {
"iopub.execute_input": "2023-03-19T08:54:37.492677Z",
"iopub.status.busy": "2023-03-19T08:54:37.492327Z",
"iopub.status.idle": "2023-03-19T08:54:44.019837Z",
"shell.execute_reply": "2023-03-19T08:54:44.018820Z"
"iopub.execute_input": "2024-03-18T13:17:08.395641Z",
"iopub.status.busy": "2024-03-18T13:17:08.395386Z",
"iopub.status.idle": "2024-03-18T13:17:12.986572Z",
"shell.execute_reply": "2024-03-18T13:17:12.986010Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Runtime: 6.52 for 25 iterations)\n"
"Runtime: 4.59 for 25 iterations)\n"
]
}
],
Expand All @@ -380,10 +381,10 @@
"id": "b2b3d623",
"metadata": {
"execution": {
"iopub.execute_input": "2023-03-19T08:54:44.026632Z",
"iopub.status.busy": "2023-03-19T08:54:44.024799Z",
"iopub.status.idle": "2023-03-19T08:54:44.034480Z",
"shell.execute_reply": "2023-03-19T08:54:44.033518Z"
"iopub.execute_input": "2024-03-18T13:17:12.989282Z",
"iopub.status.busy": "2024-03-18T13:17:12.988845Z",
"iopub.status.idle": "2024-03-18T13:17:12.992724Z",
"shell.execute_reply": "2024-03-18T13:17:12.992202Z"
},
"lines_to_next_cell": 0
},
Expand All @@ -393,16 +394,16 @@
"output_type": "stream",
"text": [
"Model with rank: 1\n",
"Mean validation score: 0.492 (std: 0.048)\n",
"Parameters: {'morgantransformer__nBits': 2048, 'morgantransformer__radius': 4, 'morgantransformer__useCounts': False, 'morgantransformer__useFeatures': False, 'ridge__alpha': 18.49687467370123}\n",
"Mean validation score: 0.526 (std: 0.183)\n",
"Parameters: {'morganfingerprinttransformer__nBits': 4096, 'morganfingerprinttransformer__radius': 2, 'morganfingerprinttransformer__useCounts': False, 'morganfingerprinttransformer__useFeatures': False, 'ridge__alpha': 0.03506956723396942}\n",
"\n",
"Model with rank: 2\n",
"Mean validation score: 0.446 (std: 0.188)\n",
"Parameters: {'morgantransformer__nBits': 256, 'morgantransformer__radius': 2, 'morgantransformer__useCounts': False, 'morgantransformer__useFeatures': False, 'ridge__alpha': 0.22311472368521185}\n",
"Mean validation score: 0.523 (std: 0.127)\n",
"Parameters: {'morganfingerprinttransformer__nBits': 4096, 'morganfingerprinttransformer__radius': 4, 'morganfingerprinttransformer__useCounts': False, 'morganfingerprinttransformer__useFeatures': False, 'ridge__alpha': 7.715592595566691}\n",
"\n",
"Model with rank: 3\n",
"Mean validation score: 0.440 (std: 0.025)\n",
"Parameters: {'morgantransformer__nBits': 4096, 'morgantransformer__radius': 3, 'morgantransformer__useCounts': False, 'morgantransformer__useFeatures': True, 'ridge__alpha': 6.023414585108017}\n",
"Mean validation score: 0.495 (std: 0.096)\n",
"Parameters: {'morganfingerprinttransformer__nBits': 2048, 'morganfingerprinttransformer__radius': 3, 'morganfingerprinttransformer__useCounts': False, 'morganfingerprinttransformer__useFeatures': False, 'ridge__alpha': 28.390132035436164}\n",
"\n"
]
}
Expand Down Expand Up @@ -433,19 +434,19 @@
"id": "cb369a0e",
"metadata": {
"execution": {
"iopub.execute_input": "2023-03-19T08:54:44.041562Z",
"iopub.status.busy": "2023-03-19T08:54:44.039729Z",
"iopub.status.idle": "2023-03-19T08:54:44.258048Z",
"shell.execute_reply": "2023-03-19T08:54:44.256978Z"
"iopub.execute_input": "2024-03-18T13:17:12.995596Z",
"iopub.status.busy": "2024-03-18T13:17:12.995274Z",
"iopub.status.idle": "2024-03-18T13:17:13.177121Z",
"shell.execute_reply": "2024-03-18T13:17:13.176195Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"No Standardization 0.5222\n",
"With Standardization 0.5222\n"
"No Standardization 0.6791\n",
"With Standardization 0.6791\n"
]
}
],
Expand Down Expand Up @@ -477,19 +478,25 @@
"id": "f6426b23",
"metadata": {
"execution": {
"iopub.execute_input": "2023-03-19T08:54:44.264895Z",
"iopub.status.busy": "2023-03-19T08:54:44.264460Z",
"iopub.status.idle": "2023-03-19T08:54:44.295917Z",
"shell.execute_reply": "2023-03-19T08:54:44.294841Z"
"iopub.execute_input": "2024-03-18T13:17:13.180039Z",
"iopub.status.busy": "2024-03-18T13:17:13.179823Z",
"iopub.status.idle": "2024-03-18T13:17:13.196436Z",
"shell.execute_reply": "2024-03-18T13:17:13.195835Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Predictions with no standardization: [6.29056823 6.31806374 6.34706809 6.44119607 6.41499794]\n",
"Predictions with standardization: [6.29056823 6.29056823 6.29056823 6.29056823 6.29056823]\n"
"Predictions with no standardization: [5.77626555 5.94364787 5.94364787 6.13649679 6.04966392]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Predictions with standardization: [5.77626555 5.77626555 5.77626555 5.77626555 5.77626555]\n"
]
}
],
Expand Down Expand Up @@ -539,7 +546,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.10.12"
}
},
"nbformat": 4,
Expand Down
5 changes: 3 additions & 2 deletions notebooks/06_hyperparameter_tuning.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.14.5
# jupytext_version: 1.16.1
# kernelspec:
# display_name: Python 3.9.4 ('rdkit')
# language: python
Expand Down Expand Up @@ -96,7 +96,8 @@

# %% Now hyperparameter tuning
from sklearn.model_selection import RandomizedSearchCV
from sklearn.utils.fixes import loguniform
#from sklearn.utils.fixes import loguniform
from scipy.stats import loguniform

# %% [markdown]
# With the pipelines, getting the names of the parameters to tune is a bit more tricky, as they are concatenations of the name of the step and the parameter with double underscores in between. We can get the available parameters from the pipeline with the get_params() method, and select the parameters we want to change from there.
Expand Down
Loading

0 comments on commit 42673ff

Please sign in to comment.