REFACTO: in split setting, remove checking NaNs and irrelevant aggregation to avoid triggering unwanted warnings #586

Valentin-Laurent · 2025-01-05T11:40:01Z

Currently, during calibration, the same logic is used in the split setting and in the cross setting.

Specifically, at some point we call check_nan_in_aposteriori_prediction and aggregate_all in both settings.

It works in the split setting because check_nan_in_aposteriori_prediction does basically nothing except checking NaNs, and aggregate_all simply flattens the prediction matrix to a prediction array, from shape (n_samples, 1) to shape (n_samples,).

However, calling those 2 functions brings 2 issues:

check_nan_in_aposteriori_prediction will always trigger a warning because by definition the train samples are not used for calibration.
aggregate_all also triggers warning in the split setting. Moreover, aggregating is not needed anyways in the split setting during calibration, and the dependency on agg_function could be removed entirely in further refactoring

In this PR, we check if we are in a split setting, and if yes simplify the code by simply flattening the array. It is not an ideal solution because we add an extra condition to the existing logic, but it fixes the first issue in a pragmatic way, and prepares the code for further refactoring.

…arning, and remove useless aggregation to avoid dependency to agg_function

…e warning, and remove useless aggregation to avoid dependency to agg_function

codecov-commenter · 2025-01-05T21:57:03Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (614293e) to head (aebfe7d).
Report is 835 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff             @@
##            master      #586     +/-   ##
===========================================
  Coverage   100.00%   100.00%             
===========================================
  Files           39        61     +22     
  Lines         4616      6003   +1387     
  Branches       487       352    -135     
===========================================
+ Hits          4616      6003   +1387

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

vincentblot28

All good to me, it improves the readability of the code !

Valentin-Laurent added 4 commits January 5, 2025 11:13

REFACTO: in split setting, remove checking NaNs to avoid inevitable w…

e57cc09

…arning, and remove useless aggregation to avoid dependency to agg_function

#2 REFACTO: in split setting, remove checking NaNs to avoid inevitabl…

f1c6099

…e warning, and remove useless aggregation to avoid dependency to agg_function

FIX: simplify condition, fix tests

3449a48

FIX linting

aebfe7d

vincentblot28 approved these changes Jan 6, 2025

View reviewed changes

Valentin-Laurent merged commit d8665e4 into master Jan 6, 2025
8 checks passed

Valentin-Laurent deleted the refacto-split-regressor branch January 6, 2025 14:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REFACTO: in split setting, remove checking NaNs and irrelevant aggregation to avoid triggering unwanted warnings #586

REFACTO: in split setting, remove checking NaNs and irrelevant aggregation to avoid triggering unwanted warnings #586

Valentin-Laurent commented Jan 5, 2025 •

edited

Loading

codecov-commenter commented Jan 5, 2025

vincentblot28 left a comment

REFACTO: in split setting, remove checking NaNs and irrelevant aggregation to avoid triggering unwanted warnings #586

REFACTO: in split setting, remove checking NaNs and irrelevant aggregation to avoid triggering unwanted warnings #586

Conversation

Valentin-Laurent commented Jan 5, 2025 • edited Loading

codecov-commenter commented Jan 5, 2025

Codecov Report

vincentblot28 left a comment

Choose a reason for hiding this comment

Valentin-Laurent commented Jan 5, 2025 •

edited

Loading