Extra `estimator.fit()` call and source of wrapper predictions #536

elcorto · 2024-11-12T13:47:21Z

elcorto
Nov 12, 2024

Hi, thanks for this very useful project.

I have a question which is most likely trivial, but I can't seem to find the answer in the docs.

I was wondering why MAPIE does an extra estimator.fit() call using the whole train data first when using CV or the split method (see the code below). Is this used to produce y_pred, while the other calls are used to calculate the conformal scores? Given an estimator vs. its MapieRegressor wrapper, i.e.

model = SomeEstimator()
model.fit(X_train, y_train)
y_pred_plain = model.predict(X_test)

vs.

model = SomeEstimator()
mapie_model = MapieRegressor(estimator=model, ...)
mapie_model.fit(X_train, y_train)
y_pred_mapie, y_pis = mapie_model.predict(X_test)

I would have assumed that y_pred_plain and y_pred_mapie are not exactly the same (but close) since:

cross-conformal: we fit N models on N folds and average their preds
split-conformal: we split a calib set from the train set, so the used train set is smaller

However CV below shows y_pred_plain == y_pred_mapie. The split method doesn't, but that may be due to the warning I'm seeing. Maybe I am overlooking something obvious? Thanks for any hints.

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

from mapie.regression import MapieRegressor

from icecream import ic


class DebugLinearRegression(LinearRegression):
    def fit(self, X, y, *args, **kwds):
        ic("fit called", X.shape, y.shape)
        return super().fit(X, y, *args, **kwds)


def ref_pred(X_train, y_train, X_test):
    model = LinearRegression()
    model.fit(X_train, y_train)
    return model.predict(X_test)


rng = np.random.default_rng(123)
X = rng.random(size=(500, 1))
y = rng.random(size=500)

# cv + 1 fits
ic("cv+")
# ic| 'fit called', X.shape: (400, 1), y.shape: (400,)
# ic| 'fit called', X.shape: (320, 1), y.shape: (320,)
# ic| 'fit called', X.shape: (320, 1), y.shape: (320,)
# ic| 'fit called', X.shape: (320, 1), y.shape: (320,)
# ic| 'fit called', X.shape: (320, 1), y.shape: (320,)
# ic| 'fit called', X.shape: (320, 1), y.shape: (320,)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = DebugLinearRegression()
mapie_model = MapieRegressor(estimator=model, cv=5, method="plus")
mapie_model.fit(X_train, y_train)
y_pred, y_pis = mapie_model.predict(X_test, alpha=[0.05])
assert (ref_pred(X_train, y_train, X_test) == y_pred).all()


# 2 fits, maybe variable PI width depending on
# MapieRegressor(conformity_score=...)
ic("split")
# ic| 'fit called', X.shape: (400, 1), y.shape: (400,)
# ic| 'fit called', X.shape: (320, 1), y.shape: (320,)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = DebugLinearRegression()
mapie_model = MapieRegressor(estimator=model, cv="split", test_size=0.2)
mapie_model.fit(X_train, y_train)
y_pred, y_pis = mapie_model.predict(X_test, alpha=[0.05])
# hmm, very much not exact here, maybe b/c we see
#   WARNING: at least one point of training set belongs to every resamplings.
##assert (ref_pred(X_train, y_train, X_test) == y_pred).all()
assert np.allclose(ref_pred(X_train, y_train, X_test), y_pred, atol=1e-1)


# 1 fit, constant PI width
ic("prefit")
# ic| 'fit called', X.shape: (320, 1), y.shape: (320,)
X_train_cal, X_test, y_train_cal, y_test = train_test_split(
    X, y, test_size=0.2
)
X_train, X_cal, y_train, y_cal = train_test_split(
    X_train_cal, y_train_cal, test_size=0.2
)
model = DebugLinearRegression()
model.fit(X_train, y_train)
mapie_model = MapieRegressor(estimator=model, cv="prefit")
mapie_model.fit(X_cal, y_cal)
y_pred, y_pis = mapie_model.predict(X_test, alpha=[0.05])
assert (ref_pred(X_train, y_train, X_test) == y_pred).all()


# 1 fit, constant PI width
ic("naive")
# ic| 'fit called', X.shape: (400, 1), y.shape: (400,)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = DebugLinearRegression()
mapie_model = MapieRegressor(estimator=model, method="naive", test_size=0.2)
mapie_model.fit(X_train, y_train)
y_pred, y_pis = mapie_model.predict(X_test, alpha=[0.05])
assert (ref_pred(X_train, y_train, X_test) == y_pred).all()

Answered by Valentin-Laurent

Jan 19, 2025

Hello @elcorto,
the internals of MAPIE are not always optimised at the moment. Particularly, split and cross methods are handled by the same mechanisms, and in the split setting, there is indeed one extra useless fit that we should remove.
This will likely be done after we release MAPIE v1.0.0. The difference you see between 0.9.2 and 0.9.1 is due to some refactoring we started to prepare for the release of v1.

View full answer

Valentin-Laurent · 2025-01-16T12:50:19Z

Valentin-Laurent
Jan 16, 2025
Maintainer

Hello @elcorto, thank you for the warm feedback, and sorry for the response delay, we tend to forget the GitHub Q&A section!

In a cross conformal setting, there is indeed an option to average predictions (using agg_function, and ensemble=True at predict time). The default ensemble is False, that's why y_pred_plain == y_pred_mapie in your example (we're using a model trained on the entire dataset provided for inference).

Regarding the split setting, as you mentioned, the training set is smaller due to the internal train/test split MAPIE performs. Maybe that's why you're seeing differences? The warning is irrelevant here, and will be removed in the upcoming MAPIE v9.2

In a prefit setting, we're using the pre-trained model at inference, so y_pred_plain == y_pred_mapie should be equal.

Let me know if I answered to your question.

4 replies

elcorto Jan 17, 2025
Author

Hello @elcorto, thank you for the warm feedback, and sorry for the response delay, we tend to forget the GitHub Q&A section!

No worries, I used this since the question doesn't fall into any of the issue categories that you provide (bug, docs, feature request), which I think makes sense, even though the boundaries between question and issue are often blurry. Maybe one could use an issue label "question" instead, as some other projects do.

In a cross conformal setting, there is indeed an option to average predictions (using agg_function, and ensemble=True at predict time). The default ensemble is False, that's why y_pred_plain == y_pred_mapie in your example (we're using a model trained on the entire dataset provided for inference).

OK that explains it, thanks.

Regarding the split setting, as you mentioned, the training set is smaller due to the internal train/test split MAPIE performs. Maybe that's why you're seeing differences? The warning is irrelevant here, and will be removed in the upcoming MAPIE v9.2

Yes, that explains why y_pred_plain != y_pred_mapie, which is the result I'd expect. After the internal split the fit call is this

# ic| 'fit called', X.shape: (320, 1), y.shape: (320,)

where 80 points have been split off as calibration data, as far as I understand. But why is there another fit call using all data before that, as in the "cv" case

# ic| 'fit called', X.shape: (400, 1), y.shape: (400,)

and what is it used for, if not to make predictions with?

elcorto Jan 18, 2025
Author

The above behavior is with v0.9.1. Using 0.9.2, I still see 2 fit calls in the split setting, but now both using a dataset of size 320 (so the size of the train set after the internal split).

0.9.1

ic| 'split'
ic| 'fit called', X.shape: (400, 1), y.shape: (400,)
ic| 'fit called', X.shape: (320, 1), y.shape: (320,)

0.9.2

ic| 'split'
ic| 'fit called', X.shape: (320, 1), y.shape: (320,)
ic| 'fit called', X.shape: (320, 1), y.shape: (320,)

Valentin-Laurent Jan 19, 2025
Maintainer

Hello @elcorto,
the internals of MAPIE are not always optimised at the moment. Particularly, split and cross methods are handled by the same mechanisms, and in the split setting, there is indeed one extra useless fit that we should remove.
This will likely be done after we release MAPIE v1.0.0. The difference you see between 0.9.2 and 0.9.1 is due to some refactoring we started to prepare for the release of v1.

Answer selected by elcorto

elcorto Jan 20, 2025
Author

OK, I see, thanks for the update. The reason I looked into debugging this was me testing MAPIE with models where one fit() call takes a sizable amount of time. Then any extra call is noticeable and makes you wonder what's going on. Looking forward to seeing the package be improved in that regard. Thanks so much for your work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extra `estimator.fit()` call and source of wrapper predictions #536

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Extra estimator.fit() call and source of wrapper predictions #536

elcorto Nov 12, 2024

Replies: 1 comment · 4 replies

Valentin-Laurent Jan 16, 2025 Maintainer

elcorto Jan 17, 2025 Author

elcorto Jan 18, 2025 Author

Valentin-Laurent Jan 19, 2025 Maintainer

elcorto Jan 20, 2025 Author

Extra `estimator.fit()` call and source of wrapper predictions #536

elcorto
Nov 12, 2024

Replies: 1 comment 4 replies

Valentin-Laurent
Jan 16, 2025
Maintainer

elcorto Jan 17, 2025
Author

elcorto Jan 18, 2025
Author

Valentin-Laurent Jan 19, 2025
Maintainer

elcorto Jan 20, 2025
Author