Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REFACTO: in split setting, remove checking NaNs and irrelevant aggregation to avoid triggering unwanted warnings #586

Merged
merged 4 commits into from
Jan 6, 2025

Conversation

Valentin-Laurent
Copy link
Collaborator

@Valentin-Laurent Valentin-Laurent commented Jan 5, 2025

Currently, during calibration, the same logic is used in the split setting and in the cross setting.

Specifically, at some point we call check_nan_in_aposteriori_prediction and aggregate_all in both settings.

It works in the split setting because check_nan_in_aposteriori_prediction does basically nothing except checking NaNs, and aggregate_all simply flattens the prediction matrix to a prediction array, from shape (n_samples, 1) to shape (n_samples,).

However, calling those 2 functions brings 2 issues:

  1. check_nan_in_aposteriori_prediction will always trigger a warning because by definition the train samples are not used for calibration.
  2. aggregate_all also triggers warning in the split setting. Moreover, aggregating is not needed anyways in the split setting during calibration, and the dependency on agg_function could be removed entirely in further refactoring

In this PR, we check if we are in a split setting, and if yes simplify the code by simply flattening the array. It is not an ideal solution because we add an extra condition to the existing logic, but it fixes the first issue in a pragmatic way, and prepares the code for further refactoring.

…arning, and remove useless aggregation to avoid dependency to agg_function
…e warning, and remove useless aggregation to avoid dependency to agg_function
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (614293e) to head (aebfe7d).
Report is 835 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##            master      #586     +/-   ##
===========================================
  Coverage   100.00%   100.00%             
===========================================
  Files           39        61     +22     
  Lines         4616      6003   +1387     
  Branches       487       352    -135     
===========================================
+ Hits          4616      6003   +1387     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@vincentblot28 vincentblot28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good to me, it improves the readability of the code !

@Valentin-Laurent Valentin-Laurent merged commit d8665e4 into master Jan 6, 2025
8 checks passed
@Valentin-Laurent Valentin-Laurent changed the title REFACTO: in split setting, remove checking NaNs to avoid inevitable warning, and remove useless aggregation to avoid dependency to agg_function REFACTO: in split setting, remove checking NaNs and irrelevant aggregation to avoid triggering unwanted warnings Jan 6, 2025
@Valentin-Laurent Valentin-Laurent deleted the refacto-split-regressor branch January 6, 2025 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants