Expanded example in Model Evaluation + updated scripts and rendered n…

…otebooks (#2) * updated all deps in repo with min versions * removed all left-over from euroscipy to new repo ref. + notes on conda env on Apple M1 * Tweaks and fix to all notebooks, checking all deps. - Data and model loading made OS independent - environment files now reflect all deps in notebooks - tweaks and fix of couple of typos - removed warning in last notebook - fixed mistake in testing using the wrong dataframe in pandera's scheme validation. * validated content with sync via jupytext * updated all links from master to main * renamed jupytext config file as per new features * changed all master to main * Model evaluation with Metrics and Test set eval * rendered notebooks
JesperDramsch · Dec 2, 2022 · dd16ec2 · dd16ec2
1 parent f9c68df
commit dd16ec2
Show file tree

Hide file tree

Showing 13 changed files with 2,780 additions and 1,090 deletions.
diff --git a/README.md b/README.md
@@ -59,6 +59,8 @@ $ pip install -r requirements.txt
 ```
 ### Using Conda
 
+⚠️ If you're using Apple with M1 Chip, please follow these [instructions](#note-for-conda-on-apple-m1-chip)
+
 You can create an `pydata-global-2022-ml-repro` conda environment executing:
 
 ```
@@ -77,7 +79,7 @@ You might also only update your current environment using:
 $ conda env update --prefix ./env --file environment.yml  --prune
 ```
 
-#### Note for Conda nn Apple M1 Chip
+#### Note for Conda on Apple M1 Chip
 
 If you're using a Mac with the latest M1 chip, it is highly recommended to install the packages in 
 your conda environment specifically tailored for your hardware architecture (i.e. `arm64`).
@@ -124,7 +126,7 @@ So how do we actually go about obtaining these goals?
 
 ## Data
 
-[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/0%20-%20Basic%20Data%20Prep%20and%20Model.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/0%20-%20Basic%20Data%20Prep%20and%20Model.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/0%20-%20Basic%20Data%20Prep%20and%20Model.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/0%20-%20Basic%20Data%20Prep%20and%20Model.ipynb)
+[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/0%20-%20Basic%20Data%20Prep%20and%20Model.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/0%20-%20Basic%20Data%20Prep%20and%20Model.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/0%20-%20Basic%20Data%20Prep%20and%20Model.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/0%20-%20Basic%20Data%20Prep%20and%20Model.ipynb)
 
 This tutorial uses the [Palmer Penguins dataset](https://allisonhorst.github.io/palmerpenguins/).
 
@@ -137,7 +139,7 @@ Data were collected and made available by [Dr. Kristen Gorman](https://www.uaf.e
 
 ## Model Evaluation
 
-[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/1%20-%20Model%20Evaluation.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/1%20-%20Model%20Evaluation.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/1%20-%20Model%20Evaluation.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/1%20-%20Model%20Evaluation.ipynb)
+[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/1%20-%20Model%20Evaluation.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/1%20-%20Model%20Evaluation.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/1%20-%20Model%20Evaluation.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/1%20-%20Model%20Evaluation.ipynb)
 
 Applying machine learning in an applied science context is often method work. We build a prototype model and expect want to show that this method can be applied to our specific problem. This means that we have to guarantee that the insights we glean from this application generalize to new data from the same problem set.
 
@@ -153,7 +155,7 @@ So we’ll go into some methods to properly evaluate machine learning models eve
 
 ## Benchmarking
 
-[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/2%20-%20Benchmarking.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/2%20-%20Benchmarking.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/2%20-%20Benchmarking.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/2%20-%20Benchmarking.ipynb)
+[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/2%20-%20Benchmarking.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/2%20-%20Benchmarking.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/2%20-%20Benchmarking.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/2%20-%20Benchmarking.ipynb)
 
 Another common reason for rejections of machine learning papers in applied science is the lack of proper benchmarks. This section will be fairly short, as it differs from discipline to discipline.
 
@@ -165,7 +167,7 @@ However, any time we apply a superfancy deep neural network, we need to supply a
 
 ## Model Sharing
 
-[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/3%20-%20Model%20Sharing.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/3%20-%20Model%20Sharing.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/3%20-%20Model%20Sharing.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/3%20-%20Model%20Sharing.ipynb)
+[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/3%20-%20Model%20Sharing.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/3%20-%20Model%20Sharing.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/3%20-%20Model%20Sharing.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/3%20-%20Model%20Sharing.ipynb)
 
 Some journals will require the sharing of code or models, but even if they don’t we might benefit from it.
 
@@ -184,7 +186,7 @@ In this section, we explore how we can export models and make our training codes
 
 ## Testing
 
-[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/4%20-%20Testing.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/4%20-%20Testing.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/4%20-%20Testing.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/4%20-%20Testing.ipynb)
+[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/4%20-%20Testing.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/4%20-%20Testing.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/4%20-%20Testing.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/4%20-%20Testing.ipynb)
 
 Machine learning is very hard to test. Due to the nature of the our models, we often have soft failures in the model that are difficult to test against.
 
@@ -200,7 +202,7 @@ Writing software tests in science, is already incredibly hard, so in this sectio
 
 ## Interpretability
 
-[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/5%20-%20Interpretability.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/5%20-%20Interpretability.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/5%20-%20Interpretability.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/5%20-%20Interpretability.ipynb)
+[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/5%20-%20Interpretability.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/5%20-%20Interpretability.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/5%20-%20Interpretability.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/5%20-%20Interpretability.ipynb)
 
 One way to probe the models we build is to test them against the established knowledge of domain experts. In this final section, we’ll explore how to build intuitions about our machine learning model and avoid pitfalls like spurious correlations. These methods for model interpretability increase our trust into models, but they can also serve as an additional level of reproducibility in our research and a valuable research artefact that can be discussed in a publication.
 
@@ -214,7 +216,7 @@ This section will introduce tools like `shap`, discuss feature importance, and m
 
 ## Ablation Studies
 
-[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/6%20-%20Ablation%20Study.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/6%20-%20Ablation%20Study.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/6%20-%20Ablation%20Study.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/master/notebooks/6%20-%20Ablation%20Study.ipynb)
+[![](https://img.shields.io/badge/view-notebook-orange)](notebooks/6%20-%20Ablation%20Study.ipynb) [![](https://img.shields.io/badge/open-colab-yellow)](https://colab.research.google.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/6%20-%20Ablation%20Study.ipynb) [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/6%20-%20Ablation%20Study.ipynb) [![Open%20In%20SageMaker%20Studio%20Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/jesperdramsch/ml-for-science-reproducibility-tutorial/blob/main/notebooks/6%20-%20Ablation%20Study.ipynb)
 
 Finally, the gold standard in building complex machine learning models is proving that each constituent part of the model contributes something to the proposed solution. 
 

diff --git a/.jupytext.toml → jupytext.toml b/.jupytext.toml → jupytext.toml