Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up dask graph creation #12

Open
artttt opened this issue Sep 2, 2023 · 1 comment
Open

Speed up dask graph creation #12

artttt opened this issue Sep 2, 2023 · 1 comment
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@artttt
Copy link
Collaborator

artttt commented Sep 2, 2023

When a dask array has many chunks the graph creation can take a few minute.

In my mind it's not a very complex process so should be very fast but I probably have too simple an understanding of what dask is doing.

I'm wondering if there are any tricks to speed this up. Maybe there is some hashing going on that can be avoided or dask is trying hard to find some optimisations of the graph that are not there or something similar.

Note in the call of to_delayed I turn graph optimise off. I do this as it ends up repeatedly reading in the same source chunks. Maybe there is a more nuanced way to do this and get some benefits.

@artttt artttt added enhancement New feature or request help wanted Extra attention is needed labels Sep 2, 2023
@artttt
Copy link
Collaborator Author

artttt commented Sep 6, 2023

Also memory use balloons when the graph gets large and slow.

Would love a fix where the answer isn't make chunks larger and use larger memory workers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant