Merge pull request #444 from scikit-learn-contrib/415-documentation-f…

…or-winkler-interval-score 415 documentation for winkler interval score
scikit-learn-contrib · May 27, 2024 · aeb7894 · aeb7894
2 parents be78d7a + 4d823ab
commit aeb7894
Show file tree

Hide file tree

Showing 18 changed files with 421 additions and 212 deletions.
diff --git a/HISTORY.rst b/HISTORY.rst
@@ -9,6 +9,7 @@ History
 * Reduce precision for test in `MapieCalibrator`.
 * Fix invalid certificate when downloading data.
 * Add citations utility to the documentation.
+* Add documentation for metrics.
 * Add explanation and example for symmetry argument in CQR.
 
 0.8.3 (2024-03-01)

diff --git a/README.rst b/README.rst
@@ -172,23 +172,28 @@ and with the financial support from Région Ile de France and Confiance.ai.
 |Quantmetry| |Michelin| |ENS| |Confiance.ai| |IledeFrance|
 
 .. |Quantmetry| image:: https://www.quantmetry.com/wp-content/uploads/2020/08/08-Logo-quant-Texte-noir.svg
-    :height: 35
+    :height: 35px
+    :width: 140px
     :target: https://www.quantmetry.com/
 
 .. |Michelin| image:: https://agngnconpm.cloudimg.io/v7/https://dgaddcosprod.blob.core.windows.net/corporate-production/attachments/cls05tqdd9e0o0tkdghwi9m7n-clooe1x0c3k3x0tlu4cxi6dpn-bibendum-salut.full.png
-    :height: 35
+    :height: 50px
+    :width: 45px
     :target: https://www.michelin.com/en/
 
 .. |ENS| image:: https://file.diplomeo-static.com/file/00/00/01/34/13434.svg
-    :height: 35
+    :height: 35px
+    :width: 140px
     :target: https://ens-paris-saclay.fr/en
 
 .. |Confiance.ai| image:: https://pbs.twimg.com/profile_images/1443838558549258264/EvWlv1Vq_400x400.jpg
-    :height: 35
+    :height: 45px
+    :width: 45px
     :target: https://www.confiance.ai/
 
 .. |IledeFrance| image:: https://www.iledefrance.fr/sites/default/files/logo/2024-02/logoGagnerok.svg
-    :height: 35
+    :height: 35px
+    :width: 140px
     :target: https://www.iledefrance.fr/
 
 

diff --git a/doc/index.rst b/doc/index.rst
@@ -58,6 +58,13 @@
    examples_calibration/index
    notebooks_calibration
 
+.. toctree::
+   :maxdepth: 2
+   :hidden:
+   :caption: METRICS
+
+   theoretical_description_metrics
+
 .. toctree::
    :maxdepth: 2
    :hidden:

diff --git a/doc/notebooks_classification.rst b/doc/notebooks_classification.rst
@@ -6,8 +6,8 @@ problems for computer vision settings that are too heavy to be included in the e
 galleries.
 
 
-1. Estimating prediction sets on the Cifar10 dataset : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/Cifar10.ipynb>`_
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
+1. Estimating prediction sets on the Cifar10 dataset : `cifar_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/Cifar10.ipynb>`_
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
-2. Top-label calibration for outputs of ML models : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/top_label_calibration.ipynb>`_
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+2. Top-label calibration for outputs of ML models : `top_label_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/top_label_calibration.ipynb>`_
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
diff --git a/doc/notebooks_multilabel_classification.rst b/doc/notebooks_multilabel_classification.rst
@@ -5,8 +5,8 @@ The following examples present advanced analyses
 on multi-label classification problems with different 
 methods proposed in MAPIE.
 
-1. Overview of Recall Control for Multi-Label Classification : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/tutorial_multilabel_classification_recall.ipynb>`_
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+1. Overview of Recall Control for Multi-Label Classification : `recall_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/tutorial_multilabel_classification_recall.ipynb>`_
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
-2. Overview of Precision Control for Multi-Label Classification : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/tutorial_multilabel_classification_precision.ipynb>`_
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+2. Overview of Precision Control for Multi-Label Classification : `precision_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/tutorial_multilabel_classification_precision.ipynb>`_
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
diff --git a/doc/notebooks_regression.rst b/doc/notebooks_regression.rst
@@ -8,11 +8,11 @@ This section lists a series of Jupyter notebooks hosted on the MAPIE Github repo
 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
 
-2. Estimating the uncertainties in the exoplanet masses : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/exoplanets.ipynb>`_
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+2. Estimating the uncertainties in the exoplanet masses : `exoplanet_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/exoplanets.ipynb>`_
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
 
-3. Estimating prediction intervals for time series forecast with EnbPI and ACI : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/ts-changepoint.ipynb>`_
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+3. Estimating prediction intervals for time series forecast with EnbPI and ACI : `ts_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/ts-changepoint.ipynb>`_
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
 
diff --git a/doc/quick_start.rst b/doc/quick_start.rst
@@ -7,11 +7,9 @@ In regression settings, **MAPIE** provides prediction intervals on single-output
 In classification settings, **MAPIE** provides prediction sets on multi-class data.
 In any case, **MAPIE** is compatible with any scikit-learn-compatible estimator.
 
-Estimate your prediction intervals
-==================================
 
 1. Download and install the module
-----------------------------------
+==================================
 
 Install via ``pip``:
 
@@ -33,7 +31,7 @@ To install directly from the github repository :
 
 
 2. Run MapieRegressor
----------------------
+=====================
 
 Let us start with a basic regression problem. 
 Here, we generate one-dimensional noisy data that we fit with a linear model.
@@ -114,8 +112,8 @@ It is given by the alpha parameter defined in ``MapieRegressor``, here equal to
 thus giving target coverages of ``0.95`` and ``0.68``.
 The effective coverage is the actual fraction of true labels lying in the prediction intervals.
 
-2. Run MapieClassifier
-----------------------
+3. Run MapieClassifier
+=======================
 
 Similarly, it's possible to do the same for a basic classification problem.
 

diff --git a/doc/theoretical_description_binary_classification.rst b/doc/theoretical_description_binary_classification.rst
@@ -1,10 +1,10 @@
-.. title:: Theoretical Description : contents
+.. title:: Theoretical Description Binary Classification : contents
 
 .. _theoretical_description_binay_classification:
 
-=======================
+#######################
 Theoretical Description
-=======================
+#######################
 
 There are mainly three different ways to handle uncertainty quantification in binary classification:
 calibration (see :doc:`theoretical_description_calibration`), confidence interval (CI) for the probability
@@ -83,8 +83,8 @@ for the labels of test objects which are guaranteed to be well-calibrated under
 that the observations are generated independently from the same distribution [2].
 
 
-4. References
--------------
+References
+----------
 
 [1] Gupta, Chirag, Aleksandr Podkopaev, and Aaditya Ramdas.
 "Distribution-free binary classification: prediction sets, confidence intervals, and calibration."

diff --git a/doc/theoretical_description_calibration.rst b/doc/theoretical_description_calibration.rst
@@ -2,10 +2,9 @@
 
 .. _theoretical_description_calibration:
 
-=======================
+#######################
 Theoretical Description
-=======================
-
+#######################
 
 One method for multi-class calibration has been implemented in MAPIE so far :
 Top-Label Calibration [1].
@@ -34,8 +33,8 @@ To apply calibration directly to a multi-class context, Gupta et al. propose a f
 a multi-class calibration to multiple binary calibrations (M2B).
 
 
-1. Top-Label
-------------
+Top-Label
+---------
 
 Top-Label calibration is a calibration technique introduced by Gupta et al. to calibrate the model according to the highest score and
 the corresponding class (see [1] Section 2). This framework offers to apply binary calibration techniques to multi-class calibration.
@@ -50,109 +49,8 @@ according to Top-Label calibration if:
     Pr(Y = c(X) \mid h(X), c(X)) = h(X)
 
 
-2. Metrics for calibration
---------------------------
-
-**Expected calibration error**
-
-The main metric to check if the calibration is correct is the Expected Calibration Error (ECE). It is based on two
-components, accuracy and confidence per bin. The number of bins is a hyperparamater :math:`M`, and we refer to a specific bin by
-:math:`B_m`.
-
-.. math::
-    \text{acc}(B_m) &= \frac{1}{\left| B_m \right|} \sum_{i \in B_m} {y}_i \\
-    \text{conf}(B_m) &= \frac{1}{\left| B_m \right|} \sum_{i \in B_m} \hat{f}(x)_i
-
-
-The ECE is the combination of these two metrics combined.
-
-.. math::
-    \text{ECE} = \sum_{m=1}^M \frac{\left| B_m \right|}{n} \left| acc(B_m) - conf(B_m) \right|
-
-In simple terms, once all the different bins from the confidence scores have been created, we check the mean accuracy of each bin.
-The absolute mean difference between the two is the ECE. Hence, the lower the ECE, the better the calibration was performed. 
-
-**Top-Label ECE**
-
-In the top-label calibration, we only calculate the ECE for the top-label class. Hence, per top-label class, we condition the calculation
-of the accuracy and confidence based on the top label and take the average ECE for each top-label.
-
-3. Statistical tests for calibration
-------------------------------------
-
-**Kolmogorov-Smirnov test**
-
-Kolmogorov-Smirnov test was derived in [2, 3, 4]. The idea is to consider the cumulative differences between sorted scores :math:`s_i`
-and their corresponding labels :math:`y_i` and to compare its properties to that of a standard Brownian motion. Let us consider the
-cumulative differences on sorted scores: 
-
-.. math::
-    C_k = \frac{1}{N}\sum_{i=1}^k (s_i - y_i)
-
-We also introduce a typical normalization scale :math:`\sigma`:
-
-.. math::
-    \sigma = \frac{1}{N}\sqrt{\sum_{i=1}^N s_i(1 - s_i)}
-
-The Kolmogorov-Smirnov statistic is then defined as : 
-
-.. math::
-   G = \max|C_k|/\sigma
-
-It can be shown [2] that, under the null hypothesis of well-calibrated scores, this quantity asymptotically (i.e. when N goes to infinity)
-converges to the maximum absolute value of a standard Brownian motion over the unit interval :math:`[0, 1]`. [3, 4] also provide closed-form 
-formulas for the cumulative distribution function (CDF) of the maximum absolute value of such a standard Brownian motion.
-So we state the p-value associated to the statistical test of well calibration as:
-
-.. math::
-   p = 1 - CDF(G)
-
-**Kuiper test**
-
-Kuiper test was derived in [2, 3, 4] and is very similar to Kolmogorov-Smirnov. This time, the statistic is defined as:
-
-.. math::
-   H = (\max_k|C_k| - \min_k|C_k|)/\sigma
-
-It can be shown [2] that, under the null hypothesis of well-calibrated scores, this quantity asymptotically (i.e. when N goes to infinity)
-converges to the range of a standard Brownian motion over the unit interval :math:`[0, 1]`. [3, 4] also provide closed-form 
-formulas for the cumulative distribution function (CDF) of the range of such a standard Brownian motion.
-So we state the p-value associated to the statistical test of well calibration as:
-
-.. math::
-   p = 1 - CDF(H)
-
-**Spiegelhalter test**
-
-Spiegelhalter test was derived in [6]. It is based on a decomposition of the Brier score: 
-
-.. math::
-   B = \frac{1}{N}\sum_{i=1}^N(y_i - s_i)^2
-
-where scores are denoted :math:`s_i` and their corresponding labels :math:`y_i`. This can be decomposed in two terms:
-
-.. math::
-   B = \frac{1}{N}\sum_{i=1}^N(y_i - s_i)(1 - 2s_i) + \frac{1}{N}\sum_{i=1}^N s_i(1 - s_i)
-
-It can be shown that the first term has an expected value of zero under the null hypothesis of well calibration. So we interpret
-the second term as the Brier score expected value :math:`E(B)` under the null hypothesis. As for the variance of the Brier score, it can be
-computed as:
-
-.. math::
-   Var(B) = \frac{1}{N^2}\sum_{i=1}^N(1 - 2s_i)^2 s_i(1 - s_i)
-
-So we can build a Z-score as follows: 
-
-.. math::
-   Z = \frac{B - E(B)}{\sqrt{Var(B)}} = \frac{\sum_{i=1}^N(y_i - s_i)(1 - 2s_i)}{\sqrt{\sum_{i=1}^N(1 - 2s_i)^2 s_i(1 - s_i)}}
-
-This statistic follows a normal distribution of cumulative distribution CDF so that we state the associated p-value:
-
-.. math::
-   p = 1 - CDF(Z)
-
-3. References
--------------
+References
+----------
 
 [1] Gupta, Chirag, and Aaditya K. Ramdas.
 "Top-label calibration and multiclass-to-binary reductions."
@@ -171,8 +69,7 @@ arXiv preprint arXiv:2202.00100.
 
 [4] D. A. Darling. A. J. F. Siegert.
 The First Passage Problem for a Continuous Markov Process.
-Ann. Math. Statist. 24 (4) 624 - 639, December,
-1953.
+Ann. Math. Statist. 24 (4) 624 - 639, December, 1953.
 
 [5] William Feller.
 The Asymptotic Distribution of the Range of Sums of

diff --git a/doc/theoretical_description_classification.rst b/doc/theoretical_description_classification.rst
@@ -1,11 +1,10 @@
-.. title:: Theoretical Description : contents
+.. title:: Theoretical Description Classification : contents
 
 .. _theoretical_description_classification:
 
-=======================
+#######################
 Theoretical Description
-=======================
-
+#######################
 
 Three methods for multi-class uncertainty quantification have been implemented in MAPIE so far :
 LAC (that stands for Least Ambiguous set-valued Classifier) [1], Adaptive Prediction Sets [2, 3] and Top-K [3].
@@ -141,8 +140,10 @@ Despite the RAPS method having a relatively small set size, its coverage tends t
 of the last label in the prediction set. This randomization is done as follows:
 
 - First : define the :math:`V` parameter:
+
 .. math::
    V_i = (s_i(X_i, Y_i) - \hat{q}_{1-\alpha}) / \left(\hat{\mu}(X_i)_{\pi_k} + \lambda \mathbb{1} (k > k_{reg})\right)
+
 - Compare each :math:`V_i` to :math:`U \sim` Unif(0, 1)
 - If :math:`V_i \leq U`, the last included label is removed, else we keep the prediction set as it is.
 
@@ -227,8 +228,8 @@ where :
 
 .. TO BE CONTINUED
 
-5. References
--------------
+References
+----------
 
 [1] Mauricio Sadinle, Jing Lei, & Larry Wasserman.
 "Least Ambiguous Set-Valued Classifiers With Bounded Error Levels."

diff --git a/doc/theoretical_description_conformity_scores.rst b/doc/theoretical_description_conformity_scores.rst
@@ -1,10 +1,10 @@
-.. title:: Theoretical Description : contents
+.. title:: Theoretical Description Conformity Scores : contents
 
 .. _theoretical_description_conformity_scores:
 
-=============================================
+#############################################
 Theoretical Description for Conformity Scores
-=============================================
+#############################################
 
 The :class:`mapie.conformity_scores.ConformityScore` class implements various
 methods to compute conformity scores for regression.
@@ -25,7 +25,7 @@ quantiles will be computed : one on the right side of the distribution
 and the other on the left side.
 
 1. The absolute residual score
-==============================
+------------------------------
 
 The absolute residual score (:class:`mapie.conformity_scores.AbsoluteConformityScore`)
 is the simplest and most commonly used conformal score, it translates the error
@@ -44,7 +44,7 @@ With this score, the intervals of predictions will be constant over the whole da
 This score is by default symmetric (*see above for definition*).
 
 2. The gamma score
-==================
+------------------
 
 The gamma score [2] (:class:`mapie.conformity_scores.GammaConformityScore`) adds a
 notion of adaptivity with the normalization of the residuals by the predictions.
@@ -69,7 +69,7 @@ the order of magnitude of the predictions, implying that this score should be us
 in use cases where we want greater uncertainty when the prediction is high.
 
 3. The residual normalized score
-=======================================
+--------------------------------
 
 The residual normalized score [1] (:class:`mapie.conformity_scores.ResidualNormalisedScore`)
 is slightly more complex than the previous scores.
@@ -97,7 +97,7 @@ it is not proportional to the uncertainty.
 
 
 Key takeaways
-=============
+-------------
 
 - The absolute residual score is the basic conformity score and gives constant intervals. It is the one used by default by :class:`mapie.regression.MapieRegressor`.
 - The gamma conformity score adds a notion of adaptivity by giving intervals of different sizes
@@ -107,7 +107,7 @@ Key takeaways
   without specific assumptions on the data.
 
 References
-==========
+----------
 
 [1] Lei, J., G'Sell, M., Rinaldo, A., Tibshirani, R. J., & Wasserman, L. (2018). Distribution-Free 
 Predictive Inference for Regression. Journal of the American Statistical Association, 113(523), 1094–1111.