-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to read large file with dask #12
Comments
since I haven't been following along with the new layout here, can you orient me to exactly what |
im is a bioio BioImage which is equivalent to AICSImage. It came from something like |
From the large amount of errors that follow, the lions share of it seems to be repeats of this basic pattern:
|
yeah, at a quick glance, I don't see any places in here where (and, that's not an error I've seen before) |
chatgpt suggests "The error message you're encountering, OSError: [WinError 1455] The paging file is too small for this operation to complete, typically occurs under Windows when your application is trying to map a file to memory (using mmap) but the system doesn't have enough virtual memory available to fulfill the request. " is it possible that your |
Yes it's totally possible. I'm trying it with a batch of 1 T instead of 4 (end_t = start_t + 1). It seems to be running. So much for parallelism! |
I know dask is doing something good for me overall, but I have no really good way of knowing when I am loading too much data. So you end up having to change batch sizes by trial and error. |
Yeah, indeed! |
@tlambert03 Thanks for taking a peek here. I guess it was nothing more than giving dask too much stuff. |
@toloudis @tlambert03 Is this an open issue? |
don't think so, but @toloudis opened, so he should close |
I'm not sure about closing this because I was shown a very large multi-TB nd2 file that hits the same error.. on Windows, we still see crashes in
I was going to try using bioformats-reader just to see if it could load without crashing. But I won't be able to do this for at least another week. @BrianWhitneyAI @SeanLeRoy you might want to talk to Antoine - he can give you some big nd2 files to test. It would be amazing to have this fixed :) I believe they are trying to figure out strategies to generate smaller files but we still are going to be around ~1TB last I heard. |
the thing is, I'm just not sure you're ever going to fix this? if you use |
Even in the original example, we were never intending to load the whole dataset - it's always selecting a subset of T indices. |
ah, ok. well do let me know if you're able to use bioformats successfully. ultimately, if there is something to be done, this is an issue for the nd2 repo (if you're able to find a case in which the dask array is inefficient somehow) |
Describe the Bug
Scene shape is:
300, 4, 41, 2048, 2048
. I call
And my dask cluster starts out by printing this error
and then there's a huge spew of errors while the dask graph does async work.
Expected Behavior
It should "just work".
Environment
windows, conda environment but pip installed. Dask localcluster, 4 workers, 2 threads per worker, about 16GB per worker
The text was updated successfully, but these errors were encountered: