You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
➜ python label_encoder_repro.py
Traceback (most recent call last):
File "/Users/paul/work/sources/dask-engineering/example-pipelines/criteo-HPO/label_encoder_repro.py", line 21, in
lenc = dask_le().fit(ddf["A"])
File "/Users/paul/mambaforge/envs/ml-example/lib/python3.10/site-packages/dask_ml/preprocessing/label.py", line 119, in fit
self.classes_ = classes_.compute()
File "/Users/paul/mambaforge/envs/ml-example/lib/python3.10/site-packages/dask/base.py", line 315, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/Users/paul/mambaforge/envs/ml-example/lib/python3.10/site-packages/dask/base.py", line 600, in compute
results = schedule(dsk, keys, **kwargs)
File "/Users/paul/mambaforge/envs/ml-example/lib/python3.10/site-packages/dask/threaded.py", line 89, in get
results = get_async(
File "/Users/paul/mambaforge/envs/ml-example/lib/python3.10/site-packages/dask/local.py", line 511, in get_async
raise_exception(exc, tb)
File "/Users/paul/mambaforge/envs/ml-example/lib/python3.10/site-packages/dask/local.py", line 319, in reraise
raise exc
File "/Users/paul/mambaforge/envs/ml-example/lib/python3.10/site-packages/dask/local.py", line 224, in execute_task
result = _execute_task(task, data)
File "/Users/paul/mambaforge/envs/ml-example/lib/python3.10/site-packages/dask/core.py", line 119, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/Users/paul/mambaforge/envs/ml-example/lib/python3.10/site-packages/dask/optimization.py", line 990, in __call__
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
File "/Users/paul/mambaforge/envs/ml-example/lib/python3.10/site-packages/dask/core.py", line 149, in get
result = _execute_task(task, cache)
File "/Users/paul/mambaforge/envs/ml-example/lib/python3.10/site-packages/dask/core.py", line 119, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/Users/paul/mambaforge/envs/ml-example/lib/python3.10/site-packages/dask/core.py", line 119, in
return func(*(_execute_task(a, cache) for a in args))
File "/Users/paul/mambaforge/envs/ml-example/lib/python3.10/site-packages/dask/core.py", line 119, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/Users/paul/mambaforge/envs/ml-example/lib/python3.10/site-packages/dask/utils.py", line 71, in apply
return func(*args, **kwargs)
File "/Users/paul/mambaforge/envs/ml-example/lib/python3.10/site-packages/dask/array/routines.py", line 1626, in _unique_internal
u = np.unique(ar)
File "<__array_function__ internals>", line 180, in unique
File "/Users/paul/mambaforge/envs/ml-example/lib/python3.10/site-packages/numpy/lib/arraysetops.py", line 274, in unique
ret = _unique1d(ar, return_index, return_inverse, return_counts,
File "/Users/paul/mambaforge/envs/ml-example/lib/python3.10/site-packages/numpy/lib/arraysetops.py", line 336, in _unique1d
ar.sort()
TypeError: '<' not supported between instances of 'str' and 'float'
Environment:
Dask version: 2012.12.0
Python version: 3.10
Operating System: M1 Mac
Install method (conda, pip, source): conda
The text was updated successfully, but these errors were encountered:
@DuanBoomer I'd be happy to review a PR. Thanks for volunteering. Note that I'll be largely away from my computer this week through the New Year. So if my response time is slow, I haven't forgotten about you.
Describe the issue:
When using a LabelEncoder on a dask series with missing values (as
np.nan
), a TypeError is raised with "<" being undefined for floats and strings.scikit-learn's encoder seems to handle this well for pandas and dask series. We seem to handle it well with a pandas series.
Minimal Complete Verifiable Example:
Full Trackback:
Environment:
The text was updated successfully, but these errors were encountered: