-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: DataFrameTreeReduction is no longer part of dask #566
Conversation
@douglasdavis @martindurant now that the legacy dask dataframe is no longer there, this causes a bunch of tests to fail as well, and forces dependency on dask-expr. |
Is the following enough for those failures:
? |
It should be. This import was removed recently and works fine for me locally
However, please make sure that |
Yes, it has been this way a long time. |
On further reading, this is no longer the same DataFrame as before at all ("DataFrame-like Expr Collection."). There is still a function, dask.dataframe.from_graph that supports making a thing, that might be what we need. What a mess! (I guess akimbo.dask is broken too) |
@lgray , let me know when you are working on this. I suppose dask-histogram is broken too; but the copy of tree-reduction should live here, so I'll wait for this PR and then fix the other. |
@lgray , the following gets us some of the way there: itsuccessfully produces output, but it's totally wrong (you get a pd.Series with one row containing all the data as lists). I think it's materialising the graph to low-level in --- a/src/dask_awkward/lib/io/io.py
+++ b/src/dask_awkward/lib/io/io.py
@@ -42,7 +42,7 @@ from dask_awkward.utils import first, second
if TYPE_CHECKING:
from dask.array.core import Array as DaskArray
from dask.bag.core import Bag as DaskBag
- from dask.dataframe.core import DataFrame as DaskDataFrame
+ from dask.dataframe import DataFrame as DaskDataFrame
from dask.delayed import Delayed
from fsspec.spec import AbstractFileSystem
@@ -466,8 +466,8 @@ def to_dataframe(
"""
import dask
- from dask.dataframe.core import DataFrame as DaskDataFrame
- from dask.dataframe.core import new_dd_object
+ from dask.dataframe import DataFrame as DaskDataFrame
+ from dask.dataframe import from_graph as new_dd_object
if optimize_graph:
(array,) = dask.optimize(array)
@@ -479,13 +479,15 @@ def to_dataframe(
**kwargs,
)
meta = ak.to_dataframe(length_zero_if_typetracer(array._meta), **kwargs)
+ # need _parameters = ["layer", "_meta", "divisions", "keys", "name_prefix"]
return cast(
DaskDataFrame,
new_dd_object(
intermediate.dask,
- intermediate.name,
- meta,
- intermediate.divisions,
+ name_prefix=intermediate.name,
+ _meta=meta,
+ divisions=intermediate.divisions,
+ keys=list(intermediate.dask) # WHY MATERIALIZE???
),
) |
Waiting on this and then fixing dask-histogram makes sense. I put an version cap on dask in coffea to keep things working for people in the mean time. |
Ran into this with Hist. It's quite a mess, as limiting Dask forces limiting numpy<2, which then forces Python <3.13. Finally got something working with |
You should only need to go back one release of dask, which definitely did work with numpy 2. Perhaps |
See dask-contrib/dask-awkward#566. This needs to be fixed, limiting is temporary!
Ah, okay, clearing the lock file did get it to get the latest 2024, which works. |
In the dask-awkward meeting, we decided that we should get a fix out for the core functionality here and in dask-histogram, and not worry too much about to_dataframe. In that method, we can do a version check and run the old code if it's still available, or raise NotImplemented("This only works with dask version<2025"). With a TODO to actually fix this later. OK with you, @lgray ? |
That's fine by me - we don't do much dak < - > dd interop in the first place. |
Recommendation here to use from_map dask/dask#11678 (comment) , but I don't think that will work for us. I pushed the changes, should pass all except the one test that imports dask-histogram. Assuming all is well, I will merge and try to get around to fixing that too and releasing by Monday. |
mindeps and runs with old enough python are passing, the remaining have only the one expected failure. |
This class was removed from dask itself, and dask-awkward is presently broken with latest dask because of that.