Skip to content

Commit

Permalink
Merge pull request #444 from scikit-learn-contrib/415-documentation-f…
Browse files Browse the repository at this point in the history
…or-winkler-interval-score

415 documentation for winkler interval score
  • Loading branch information
LacombeLouis authored May 27, 2024
2 parents be78d7a + 4d823ab commit aeb7894
Show file tree
Hide file tree
Showing 18 changed files with 421 additions and 212 deletions.
1 change: 1 addition & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ History
* Reduce precision for test in `MapieCalibrator`.
* Fix invalid certificate when downloading data.
* Add citations utility to the documentation.
* Add documentation for metrics.
* Add explanation and example for symmetry argument in CQR.

0.8.3 (2024-03-01)
Expand Down
15 changes: 10 additions & 5 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -172,23 +172,28 @@ and with the financial support from Région Ile de France and Confiance.ai.
|Quantmetry| |Michelin| |ENS| |Confiance.ai| |IledeFrance|

.. |Quantmetry| image:: https://www.quantmetry.com/wp-content/uploads/2020/08/08-Logo-quant-Texte-noir.svg
:height: 35
:height: 35px
:width: 140px
:target: https://www.quantmetry.com/

.. |Michelin| image:: https://agngnconpm.cloudimg.io/v7/https://dgaddcosprod.blob.core.windows.net/corporate-production/attachments/cls05tqdd9e0o0tkdghwi9m7n-clooe1x0c3k3x0tlu4cxi6dpn-bibendum-salut.full.png
:height: 35
:height: 50px
:width: 45px
:target: https://www.michelin.com/en/

.. |ENS| image:: https://file.diplomeo-static.com/file/00/00/01/34/13434.svg
:height: 35
:height: 35px
:width: 140px
:target: https://ens-paris-saclay.fr/en

.. |Confiance.ai| image:: https://pbs.twimg.com/profile_images/1443838558549258264/EvWlv1Vq_400x400.jpg
:height: 35
:height: 45px
:width: 45px
:target: https://www.confiance.ai/

.. |IledeFrance| image:: https://www.iledefrance.fr/sites/default/files/logo/2024-02/logoGagnerok.svg
:height: 35
:height: 35px
:width: 140px
:target: https://www.iledefrance.fr/


Expand Down
7 changes: 7 additions & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,13 @@
examples_calibration/index
notebooks_calibration

.. toctree::
:maxdepth: 2
:hidden:
:caption: METRICS

theoretical_description_metrics

.. toctree::
:maxdepth: 2
:hidden:
Expand Down
8 changes: 4 additions & 4 deletions doc/notebooks_classification.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ problems for computer vision settings that are too heavy to be included in the e
galleries.


1. Estimating prediction sets on the Cifar10 dataset : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/Cifar10.ipynb>`_
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
1. Estimating prediction sets on the Cifar10 dataset : `cifar_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/Cifar10.ipynb>`_
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2. Top-label calibration for outputs of ML models : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/top_label_calibration.ipynb>`_
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2. Top-label calibration for outputs of ML models : `top_label_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/top_label_calibration.ipynb>`_
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
8 changes: 4 additions & 4 deletions doc/notebooks_multilabel_classification.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ The following examples present advanced analyses
on multi-label classification problems with different
methods proposed in MAPIE.

1. Overview of Recall Control for Multi-Label Classification : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/tutorial_multilabel_classification_recall.ipynb>`_
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1. Overview of Recall Control for Multi-Label Classification : `recall_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/tutorial_multilabel_classification_recall.ipynb>`_
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2. Overview of Precision Control for Multi-Label Classification : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/tutorial_multilabel_classification_precision.ipynb>`_
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2. Overview of Precision Control for Multi-Label Classification : `precision_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/tutorial_multilabel_classification_precision.ipynb>`_
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
8 changes: 4 additions & 4 deletions doc/notebooks_regression.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ This section lists a series of Jupyter notebooks hosted on the MAPIE Github repo
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


2. Estimating the uncertainties in the exoplanet masses : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/exoplanets.ipynb>`_
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
2. Estimating the uncertainties in the exoplanet masses : `exoplanet_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/exoplanets.ipynb>`_
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


3. Estimating prediction intervals for time series forecast with EnbPI and ACI : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/ts-changepoint.ipynb>`_
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3. Estimating prediction intervals for time series forecast with EnbPI and ACI : `ts_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/ts-changepoint.ipynb>`_
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


10 changes: 4 additions & 6 deletions doc/quick_start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,9 @@ In regression settings, **MAPIE** provides prediction intervals on single-output
In classification settings, **MAPIE** provides prediction sets on multi-class data.
In any case, **MAPIE** is compatible with any scikit-learn-compatible estimator.

Estimate your prediction intervals
==================================

1. Download and install the module
----------------------------------
==================================

Install via ``pip``:

Expand All @@ -33,7 +31,7 @@ To install directly from the github repository :
2. Run MapieRegressor
---------------------
=====================

Let us start with a basic regression problem.
Here, we generate one-dimensional noisy data that we fit with a linear model.
Expand Down Expand Up @@ -114,8 +112,8 @@ It is given by the alpha parameter defined in ``MapieRegressor``, here equal to
thus giving target coverages of ``0.95`` and ``0.68``.
The effective coverage is the actual fraction of true labels lying in the prediction intervals.

2. Run MapieClassifier
----------------------
3. Run MapieClassifier
=======================

Similarly, it's possible to do the same for a basic classification problem.

Expand Down
10 changes: 5 additions & 5 deletions doc/theoretical_description_binary_classification.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
.. title:: Theoretical Description : contents
.. title:: Theoretical Description Binary Classification : contents

.. _theoretical_description_binay_classification:

=======================
#######################
Theoretical Description
=======================
#######################

There are mainly three different ways to handle uncertainty quantification in binary classification:
calibration (see :doc:`theoretical_description_calibration`), confidence interval (CI) for the probability
Expand Down Expand Up @@ -83,8 +83,8 @@ for the labels of test objects which are guaranteed to be well-calibrated under
that the observations are generated independently from the same distribution [2].


4. References
-------------
References
----------

[1] Gupta, Chirag, Aleksandr Podkopaev, and Aaditya Ramdas.
"Distribution-free binary classification: prediction sets, confidence intervals, and calibration."
Expand Down
117 changes: 7 additions & 110 deletions doc/theoretical_description_calibration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,9 @@

.. _theoretical_description_calibration:

=======================
#######################
Theoretical Description
=======================

#######################

One method for multi-class calibration has been implemented in MAPIE so far :
Top-Label Calibration [1].
Expand Down Expand Up @@ -34,8 +33,8 @@ To apply calibration directly to a multi-class context, Gupta et al. propose a f
a multi-class calibration to multiple binary calibrations (M2B).


1. Top-Label
------------
Top-Label
---------

Top-Label calibration is a calibration technique introduced by Gupta et al. to calibrate the model according to the highest score and
the corresponding class (see [1] Section 2). This framework offers to apply binary calibration techniques to multi-class calibration.
Expand All @@ -50,109 +49,8 @@ according to Top-Label calibration if:
Pr(Y = c(X) \mid h(X), c(X)) = h(X)
2. Metrics for calibration
--------------------------

**Expected calibration error**

The main metric to check if the calibration is correct is the Expected Calibration Error (ECE). It is based on two
components, accuracy and confidence per bin. The number of bins is a hyperparamater :math:`M`, and we refer to a specific bin by
:math:`B_m`.

.. math::
\text{acc}(B_m) &= \frac{1}{\left| B_m \right|} \sum_{i \in B_m} {y}_i \\
\text{conf}(B_m) &= \frac{1}{\left| B_m \right|} \sum_{i \in B_m} \hat{f}(x)_i
The ECE is the combination of these two metrics combined.

.. math::
\text{ECE} = \sum_{m=1}^M \frac{\left| B_m \right|}{n} \left| acc(B_m) - conf(B_m) \right|
In simple terms, once all the different bins from the confidence scores have been created, we check the mean accuracy of each bin.
The absolute mean difference between the two is the ECE. Hence, the lower the ECE, the better the calibration was performed.

**Top-Label ECE**

In the top-label calibration, we only calculate the ECE for the top-label class. Hence, per top-label class, we condition the calculation
of the accuracy and confidence based on the top label and take the average ECE for each top-label.

3. Statistical tests for calibration
------------------------------------

**Kolmogorov-Smirnov test**

Kolmogorov-Smirnov test was derived in [2, 3, 4]. The idea is to consider the cumulative differences between sorted scores :math:`s_i`
and their corresponding labels :math:`y_i` and to compare its properties to that of a standard Brownian motion. Let us consider the
cumulative differences on sorted scores:

.. math::
C_k = \frac{1}{N}\sum_{i=1}^k (s_i - y_i)
We also introduce a typical normalization scale :math:`\sigma`:

.. math::
\sigma = \frac{1}{N}\sqrt{\sum_{i=1}^N s_i(1 - s_i)}
The Kolmogorov-Smirnov statistic is then defined as :

.. math::
G = \max|C_k|/\sigma
It can be shown [2] that, under the null hypothesis of well-calibrated scores, this quantity asymptotically (i.e. when N goes to infinity)
converges to the maximum absolute value of a standard Brownian motion over the unit interval :math:`[0, 1]`. [3, 4] also provide closed-form
formulas for the cumulative distribution function (CDF) of the maximum absolute value of such a standard Brownian motion.
So we state the p-value associated to the statistical test of well calibration as:

.. math::
p = 1 - CDF(G)
**Kuiper test**

Kuiper test was derived in [2, 3, 4] and is very similar to Kolmogorov-Smirnov. This time, the statistic is defined as:

.. math::
H = (\max_k|C_k| - \min_k|C_k|)/\sigma
It can be shown [2] that, under the null hypothesis of well-calibrated scores, this quantity asymptotically (i.e. when N goes to infinity)
converges to the range of a standard Brownian motion over the unit interval :math:`[0, 1]`. [3, 4] also provide closed-form
formulas for the cumulative distribution function (CDF) of the range of such a standard Brownian motion.
So we state the p-value associated to the statistical test of well calibration as:

.. math::
p = 1 - CDF(H)
**Spiegelhalter test**

Spiegelhalter test was derived in [6]. It is based on a decomposition of the Brier score:

.. math::
B = \frac{1}{N}\sum_{i=1}^N(y_i - s_i)^2
where scores are denoted :math:`s_i` and their corresponding labels :math:`y_i`. This can be decomposed in two terms:

.. math::
B = \frac{1}{N}\sum_{i=1}^N(y_i - s_i)(1 - 2s_i) + \frac{1}{N}\sum_{i=1}^N s_i(1 - s_i)
It can be shown that the first term has an expected value of zero under the null hypothesis of well calibration. So we interpret
the second term as the Brier score expected value :math:`E(B)` under the null hypothesis. As for the variance of the Brier score, it can be
computed as:

.. math::
Var(B) = \frac{1}{N^2}\sum_{i=1}^N(1 - 2s_i)^2 s_i(1 - s_i)
So we can build a Z-score as follows:

.. math::
Z = \frac{B - E(B)}{\sqrt{Var(B)}} = \frac{\sum_{i=1}^N(y_i - s_i)(1 - 2s_i)}{\sqrt{\sum_{i=1}^N(1 - 2s_i)^2 s_i(1 - s_i)}}
This statistic follows a normal distribution of cumulative distribution CDF so that we state the associated p-value:

.. math::
p = 1 - CDF(Z)
3. References
-------------
References
----------

[1] Gupta, Chirag, and Aaditya K. Ramdas.
"Top-label calibration and multiclass-to-binary reductions."
Expand All @@ -171,8 +69,7 @@ arXiv preprint arXiv:2202.00100.

[4] D. A. Darling. A. J. F. Siegert.
The First Passage Problem for a Continuous Markov Process.
Ann. Math. Statist. 24 (4) 624 - 639, December,
1953.
Ann. Math. Statist. 24 (4) 624 - 639, December, 1953.

[5] William Feller.
The Asymptotic Distribution of the Range of Sums of
Expand Down
13 changes: 7 additions & 6 deletions doc/theoretical_description_classification.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
.. title:: Theoretical Description : contents
.. title:: Theoretical Description Classification : contents

.. _theoretical_description_classification:

=======================
#######################
Theoretical Description
=======================

#######################

Three methods for multi-class uncertainty quantification have been implemented in MAPIE so far :
LAC (that stands for Least Ambiguous set-valued Classifier) [1], Adaptive Prediction Sets [2, 3] and Top-K [3].
Expand Down Expand Up @@ -141,8 +140,10 @@ Despite the RAPS method having a relatively small set size, its coverage tends t
of the last label in the prediction set. This randomization is done as follows:

- First : define the :math:`V` parameter:

.. math::
V_i = (s_i(X_i, Y_i) - \hat{q}_{1-\alpha}) / \left(\hat{\mu}(X_i)_{\pi_k} + \lambda \mathbb{1} (k > k_{reg})\right)
- Compare each :math:`V_i` to :math:`U \sim` Unif(0, 1)
- If :math:`V_i \leq U`, the last included label is removed, else we keep the prediction set as it is.

Expand Down Expand Up @@ -227,8 +228,8 @@ where :
.. TO BE CONTINUED
5. References
-------------
References
----------

[1] Mauricio Sadinle, Jing Lei, & Larry Wasserman.
"Least Ambiguous Set-Valued Classifiers With Bounded Error Levels."
Expand Down
16 changes: 8 additions & 8 deletions doc/theoretical_description_conformity_scores.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
.. title:: Theoretical Description : contents
.. title:: Theoretical Description Conformity Scores : contents

.. _theoretical_description_conformity_scores:

=============================================
#############################################
Theoretical Description for Conformity Scores
=============================================
#############################################

The :class:`mapie.conformity_scores.ConformityScore` class implements various
methods to compute conformity scores for regression.
Expand All @@ -25,7 +25,7 @@ quantiles will be computed : one on the right side of the distribution
and the other on the left side.

1. The absolute residual score
==============================
------------------------------

The absolute residual score (:class:`mapie.conformity_scores.AbsoluteConformityScore`)
is the simplest and most commonly used conformal score, it translates the error
Expand All @@ -44,7 +44,7 @@ With this score, the intervals of predictions will be constant over the whole da
This score is by default symmetric (*see above for definition*).

2. The gamma score
==================
------------------

The gamma score [2] (:class:`mapie.conformity_scores.GammaConformityScore`) adds a
notion of adaptivity with the normalization of the residuals by the predictions.
Expand All @@ -69,7 +69,7 @@ the order of magnitude of the predictions, implying that this score should be us
in use cases where we want greater uncertainty when the prediction is high.

3. The residual normalized score
=======================================
--------------------------------

The residual normalized score [1] (:class:`mapie.conformity_scores.ResidualNormalisedScore`)
is slightly more complex than the previous scores.
Expand Down Expand Up @@ -97,7 +97,7 @@ it is not proportional to the uncertainty.


Key takeaways
=============
-------------

- The absolute residual score is the basic conformity score and gives constant intervals. It is the one used by default by :class:`mapie.regression.MapieRegressor`.
- The gamma conformity score adds a notion of adaptivity by giving intervals of different sizes
Expand All @@ -107,7 +107,7 @@ Key takeaways
without specific assumptions on the data.

References
==========
----------

[1] Lei, J., G'Sell, M., Rinaldo, A., Tibshirani, R. J., & Wasserman, L. (2018). Distribution-Free
Predictive Inference for Regression. Journal of the American Statistical Association, 113(523), 1094–1111.
Expand Down
Loading

0 comments on commit aeb7894

Please sign in to comment.