Skip to content

Commit

Permalink
Expanded example in Model Evaluation + updated scripts and rendered n…
Browse files Browse the repository at this point in the history
…otebooks (#2)

* updated all deps in repo with min versions

* removed all left-over from euroscipy to new repo ref. + notes on conda env on Apple M1

* Tweaks and fix to all notebooks, checking all deps.

- Data and model loading made OS independent
- environment files now reflect all deps in notebooks
- tweaks and fix of couple of typos
- removed warning in last notebook
- fixed mistake in testing using the wrong dataframe in pandera's scheme
  validation.

* validated content with sync via jupytext

* updated all links from master to main

* renamed jupytext config file as per new features

* changed all master to main

* Model evaluation with Metrics and Test set eval

* rendered notebooks
  • Loading branch information
leriomaggio authored Dec 2, 2022
1 parent f9c68df commit dd16ec2
Show file tree
Hide file tree
Showing 13 changed files with 2,780 additions and 1,090 deletions.
18 changes: 10 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ $ pip install -r requirements.txt
```
### Using Conda

⚠️ If you're using Apple with M1 Chip, please follow these [instructions](#note-for-conda-on-apple-m1-chip)

You can create an `pydata-global-2022-ml-repro` conda environment executing:

```
Expand All @@ -77,7 +79,7 @@ You might also only update your current environment using:
$ conda env update --prefix ./env --file environment.yml --prune
```

#### Note for Conda nn Apple M1 Chip
#### Note for Conda on Apple M1 Chip

If you're using a Mac with the latest M1 chip, it is highly recommended to install the packages in
your conda environment specifically tailored for your hardware architecture (i.e. `arm64`).
Expand Down Expand Up @@ -124,7 +126,7 @@ So how do we actually go about obtaining these goals?

## Data

[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/0%20-%20Basic%20Data%20Prep%20and%20Model.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/0%20-%20Basic%20Data%20Prep%20and%20Model.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/0%20-%20Basic%20Data%20Prep%20and%20Model.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/0%20-%20Basic%20Data%20Prep%20and%20Model.ipynb)
[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/0%20-%20Basic%20Data%20Prep%20and%20Model.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/0%20-%20Basic%20Data%20Prep%20and%20Model.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/0%20-%20Basic%20Data%20Prep%20and%20Model.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/0%20-%20Basic%20Data%20Prep%20and%20Model.ipynb)

This tutorial uses the [Palmer Penguins dataset](https://allisonhorst.github.io/palmerpenguins/).

Expand All @@ -137,7 +139,7 @@ Data were collected and made available by [Dr. Kristen Gorman](https://www.uaf.e

## Model Evaluation

[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/1%20-%20Model%20Evaluation.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/1%20-%20Model%20Evaluation.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/1%20-%20Model%20Evaluation.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/1%20-%20Model%20Evaluation.ipynb)
[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/1%20-%20Model%20Evaluation.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/1%20-%20Model%20Evaluation.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/1%20-%20Model%20Evaluation.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/1%20-%20Model%20Evaluation.ipynb)

Applying machine learning in an applied science context is often method work. We build a prototype model and expect want to show that this method can be applied to our specific problem. This means that we have to guarantee that the insights we glean from this application generalize to new data from the same problem set.

Expand All @@ -153,7 +155,7 @@ So we’ll go into some methods to properly evaluate machine learning models eve

## Benchmarking

[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/2%20-%20Benchmarking.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/2%20-%20Benchmarking.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/2%20-%20Benchmarking.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/2%20-%20Benchmarking.ipynb)
[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/2%20-%20Benchmarking.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/2%20-%20Benchmarking.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/2%20-%20Benchmarking.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/2%20-%20Benchmarking.ipynb)

Another common reason for rejections of machine learning papers in applied science is the lack of proper benchmarks. This section will be fairly short, as it differs from discipline to discipline.

Expand All @@ -165,7 +167,7 @@ However, any time we apply a superfancy deep neural network, we need to supply a

## Model Sharing

[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/3%20-%20Model%20Sharing.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/3%20-%20Model%20Sharing.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/3%20-%20Model%20Sharing.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/3%20-%20Model%20Sharing.ipynb)
[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/3%20-%20Model%20Sharing.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/3%20-%20Model%20Sharing.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/3%20-%20Model%20Sharing.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/3%20-%20Model%20Sharing.ipynb)

Some journals will require the sharing of code or models, but even if they don’t we might benefit from it.

Expand All @@ -184,7 +186,7 @@ In this section, we explore how we can export models and make our training codes

## Testing

[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/4%20-%20Testing.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/4%20-%20Testing.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/4%20-%20Testing.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/4%20-%20Testing.ipynb)
[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/4%20-%20Testing.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/4%20-%20Testing.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/4%20-%20Testing.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/4%20-%20Testing.ipynb)

Machine learning is very hard to test. Due to the nature of the our models, we often have soft failures in the model that are difficult to test against.

Expand All @@ -200,7 +202,7 @@ Writing software tests in science, is already incredibly hard, so in this sectio

## Interpretability

[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/5%20-%20Interpretability.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/5%20-%20Interpretability.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/5%20-%20Interpretability.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/5%20-%20Interpretability.ipynb)
[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/5%20-%20Interpretability.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/5%20-%20Interpretability.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/5%20-%20Interpretability.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/5%20-%20Interpretability.ipynb)

One way to probe the models we build is to test them against the established knowledge of domain experts. In this final section, we’ll explore how to build intuitions about our machine learning model and avoid pitfalls like spurious correlations. These methods for model interpretability increase our trust into models, but they can also serve as an additional level of reproducibility in our research and a valuable research artefact that can be discussed in a publication.

Expand All @@ -214,7 +216,7 @@ This section will introduce tools like `shap`, discuss feature importance, and m

## Ablation Studies

[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/6%20-%20Ablation%20Study.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/6%20-%20Ablation%20Study.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/6%20-%20Ablation%20Study.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/6%20-%20Ablation%20Study.ipynb)
[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/6%20-%20Ablation%20Study.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/6%20-%20Ablation%20Study.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/6%20-%20Ablation%20Study.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/6%20-%20Ablation%20Study.ipynb)

Finally, the gold standard in building complex machine learning models is proving that each constituent part of the model contributes something to the proposed solution.

Expand Down
File renamed without changes.
Loading

0 comments on commit dd16ec2

Please sign in to comment.