You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
will fail if the user does not manually specify set_output('pandas') or configure pandas mode globally for sklearn via sklearn.set_config(transform_output="pandas").
This is not very nice and might lead to errors.
Steps to Reproduce the Problem
Create a sklearn Pipeline with 1. step a preprocessor, 2. step an encoder
call fit_transform on the pipeline. This will raise an error as category encoders works with dataframes internally and after the first transform and array is given where the column names differ from what the encoder would expect.
Potential Solution
To fix this we'd need to get independent of pandas internally. This is quite difficult and requires some refactoring in all encoders. Mainly how feature names are determined. Also the benefit is rather small since there is an easy workaround in the uncommon case that the encoder is not the first step of a multi-step pipeline.
However, if a major refactoring is done for a potential version 3 we could include this as well.
The text was updated successfully, but these errors were encountered:
Expected Behavior
full compatibility with sklearn pipelines
Actual Behavior
we're only compatible with pandas mode of sklearn.
By default a multi-step pipeline, that has an encoder not as first step, e.g.
will fail if the user does not manually specify
set_output('pandas')
or configure pandas mode globally for sklearn viasklearn.set_config(transform_output="pandas")
.This is not very nice and might lead to errors.
Steps to Reproduce the Problem
fit_transform
on the pipeline. This will raise an error as category encoders works with dataframes internally and after the first transform and array is given where the column names differ from what the encoder would expect.Potential Solution
To fix this we'd need to get independent of pandas internally. This is quite difficult and requires some refactoring in all encoders. Mainly how feature names are determined. Also the benefit is rather small since there is an easy workaround in the uncommon case that the encoder is not the first step of a multi-step pipeline.
However, if a major refactoring is done for a potential version 3 we could include this as well.
The text was updated successfully, but these errors were encountered: