REFACTO: in split setting, remove checking NaNs and irrelevant aggregation to avoid triggering unwanted warnings #586
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, during calibration, the same logic is used in the split setting and in the cross setting.
Specifically, at some point we call
check_nan_in_aposteriori_prediction
andaggregate_all
in both settings.It works in the split setting because
check_nan_in_aposteriori_prediction
does basically nothing except checking NaNs, andaggregate_all
simply flattens the prediction matrix to a prediction array, from shape(n_samples, 1)
to shape(n_samples,)
.However, calling those 2 functions brings 2 issues:
check_nan_in_aposteriori_prediction
will always trigger a warning because by definition the train samples are not used for calibration.aggregate_all
also triggers warning in the split setting. Moreover, aggregating is not needed anyways in the split setting during calibration, and the dependency onagg_function
could be removed entirely in further refactoringIn this PR, we check if we are in a split setting, and if yes simplify the code by simply flattening the array. It is not an ideal solution because we add an extra condition to the existing logic, but it fixes the first issue in a pragmatic way, and prepares the code for further refactoring.