You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dask_ml.compose.ColumnTransformer does not work with objects of types dask_expr._collection.DataFrame or dask.dataframe.core.DataFrame.
Minimal Complete Verifiable Example:
importnumpyasnpimportpandasaspdfromdask_ml.composeimportColumnTransformerfromdask_ml.preprocessingimportStandardScalerimportdask.dataframeasddfromdask.distributedimportClientclient=Client()
# Create a sample dataframedf=pd.DataFrame({"A": np.random.rand(1000)})
ddf=dd.from_pandas(df, npartitions=2)
ColumnTransformer, specifying the columns using strings:
scaler=ColumnTransformer(
transformers=[("StandardScaler", StandardScaler(), ["A"])],
remainder="passthrough",
)
scaler.fit_transform(ddf) # or scaler.fit_transform(ddf.to_legacy_dataframe())
Out:
ValueError: Specifying the columns using strings is only supported for dataframes.
ColumnTransformer, specifying the columns using integers:
scaler=ColumnTransformer(
transformers=[("StandardScaler", StandardScaler(), [0])],
remainder="passthrough",
)
scaler.fit_transform(ddf) # or scaler.fit_transform(ddf.to_legacy_dataframe())
Out:
AttributeError: 'DataFrame' object has no attribute 'take'
Anything else we need to know?:
Pandas data frames, i.e.
scaler.fit_transform(ddf.compute())
works as expected.
Could be related to #962 and #887. If this is the same issue indeed, and there are no plans to fix it in the foreseeable future, could it better to remove it from the Dask ML API?
Environment:
Dask version: 2024.4.1
Dask ML version: 2024.4.1
Scikit-learn version: 1.4.0
Python version: 3.10.13
Operating System: MacOS.
Install method (conda, pip, source): pip
The text was updated successfully, but these errors were encountered:
Describe the issue:
dask_ml.compose.ColumnTransformer
does not work with objects of typesdask_expr._collection.DataFrame
ordask.dataframe.core.DataFrame
.Minimal Complete Verifiable Example:
ColumnTransformer, specifying the columns using strings:
Out:
ColumnTransformer, specifying the columns using integers:
Out:
Anything else we need to know?:
Pandas data frames, i.e.
works as expected.
Could be related to #962 and #887. If this is the same issue indeed, and there are no plans to fix it in the foreseeable future, could it better to remove it from the Dask ML API?
Environment:
The text was updated successfully, but these errors were encountered: