Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to read large file with dask #12

Open
toloudis opened this issue Feb 20, 2024 · 15 comments
Open

Failed to read large file with dask #12

toloudis opened this issue Feb 20, 2024 · 15 comments
Labels
bug Something isn't working

Comments

@toloudis
Copy link

Describe the Bug

Scene shape is:
300, 4, 41, 2048, 2048
. I call

ti = im.get_image_dask_data("TCZYX", T=slice(start_t, end_t))  # start_t = 0, end_t = 4
ti = ti.persist()

And my dask cluster starts out by printing this error

...\.conda\Lib\site-packages\distributed\worker_state_machine.py:3706: UserWarning: ND2File file not closed before garbage collection. Please use `with ND2File(...):` context or call `.close()`.
  instructions = self.state.handle_stimulus(*stims)

and then there's a huge spew of errors while the dask graph does async work.

Expected Behavior

It should "just work".

Environment

windows, conda environment but pip installed. Dask localcluster, 4 workers, 2 threads per worker, about 16GB per worker

@tlambert03
Copy link

since I haven't been following along with the new layout here, can you orient me to exactly what im is here, and the route through which get_image_dask_data would ultimately hit the bioio-nd2 and nd2 packages?

@toloudis
Copy link
Author

toloudis commented Feb 20, 2024

since I haven't been following along with the new layout here, can you orient me to exactly what im is here, and the route through which get_image_dask_data would ultimately hit the bioio-nd2 and nd2 packages?

im is a bioio BioImage which is equivalent to AICSImage. It came from something like im=BioImage(pathToND2File)
The api is basically identical to aicsimageio. Code path should look the same. The reader implementation is the same as the one in aicsimageio as you can see in https://github.com/bioio-devs/bioio-nd2/blob/main/bioio_nd2/reader.py

@toloudis
Copy link
Author

From the large amount of errors that follow, the lions share of it seems to be repeats of this basic pattern:

  File "c:\Users\danielt\source\repos\aics-int\ome-zarr-conversion\.conda\Lib\site-packages\distributed\protocol\pickle.py", line 94, in loads
    return pickle.loads(x, buffers=buffers)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\danielt\source\repos\aics-int\ome-zarr-conversion\.conda\Lib\site-packages\nd2\nd2file.py", line 229, in __setstate__
    self._rdr = ND2Reader.create(self._path, self._error_radius)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\danielt\source\repos\aics-int\ome-zarr-conversion\.conda\Lib\site-packages\nd2\readers\protocol.py", line 73, in create
    return subcls(path, error_radius=error_radius)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\danielt\source\repos\aics-int\ome-zarr-conversion\.conda\Lib\site-packages\nd2\readers\_modern\modern_reader.py", line 59, in __init__
    super().__init__(path, error_radius)
  File "c:\Users\danielt\source\repos\aics-int\ome-zarr-conversion\.conda\Lib\site-packages\nd2\readers\protocol.py", line 92, in __init__
    self.open()
  File "c:\Users\danielt\source\repos\aics-int\ome-zarr-conversion\.conda\Lib\site-packages\nd2\readers\protocol.py", line 102, in open
    self._mmap = mmap.mmap(self._fh.fileno(), 0, access=mmap.ACCESS_READ)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 1455] The paging file is too small for this operation to complete

@tlambert03
Copy link

tlambert03 commented Feb 20, 2024

yeah, at a quick glance, I don't see any places in here where ND2File() is used outside of a context... so the UserWarning: ND2File file not closed before garbage collection you mentioned here is probably just a side effect of something else causing a legitimate crash and resulting in the object getting deleted before the context closes, (and not the real problem). So, let's try to determine what's going on with that paging file error (rather than the warning)...

(and, that's not an error I've seen before)

@tlambert03
Copy link

tlambert03 commented Feb 20, 2024

chatgpt suggests "The error message you're encountering, OSError: [WinError 1455] The paging file is too small for this operation to complete, typically occurs under Windows when your application is trying to map a file to memory (using mmap) but the system doesn't have enough virtual memory available to fulfill the request. "

is it possible that your persist() command is trying to load too much?

@toloudis
Copy link
Author

Yes it's totally possible. I'm trying it with a batch of 1 T instead of 4 (end_t = start_t + 1). It seems to be running. So much for parallelism!

@toloudis
Copy link
Author

I know dask is doing something good for me overall, but I have no really good way of knowing when I am loading too much data. So you end up having to change batch sizes by trial and error.

@tlambert03
Copy link

tlambert03 commented Feb 20, 2024

Yeah, indeed!

@toloudis
Copy link
Author

@tlambert03 Thanks for taking a peek here. I guess it was nothing more than giving dask too much stuff.

@BrianWhitneyAI
Copy link
Contributor

@toloudis @tlambert03 Is this an open issue?

@tlambert03
Copy link

don't think so, but @toloudis opened, so he should close

@toloudis
Copy link
Author

I'm not sure about closing this because I was shown a very large multi-TB nd2 file that hits the same error.. on Windows, we still see crashes in mmap as described above. I have no clue what the fix might be at the moment. The latest errors I got were of this type:

File "c:\Users\danielt\source\repos\aics-int\ome-zarr-conversion\.conda\Lib\site-packages\nd2\readers\protocol.py", line 102, in open
    self._mmap = mmap.mmap(self._fh.fileno(), 0, access=mmap.ACCESS_READ)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 1455] The paging file is too small for this operation to complete

I was going to try using bioformats-reader just to see if it could load without crashing. But I won't be able to do this for at least another week.

@BrianWhitneyAI @SeanLeRoy you might want to talk to Antoine - he can give you some big nd2 files to test. It would be amazing to have this fixed :) I believe they are trying to figure out strategies to generate smaller files but we still are going to be around ~1TB last I heard.

@tlambert03
Copy link

the thing is, I'm just not sure you're ever going to fix this? if you use .persist() on the entire dataset as in the original example, you're asking for it to load the entire thing into RAM. what do you imagine a "fix" looking like here?

@toloudis
Copy link
Author

Even in the original example, we were never intending to load the whole dataset - it's always selecting a subset of T indices.

@tlambert03
Copy link

ah, ok. well do let me know if you're able to use bioformats successfully. ultimately, if there is something to be done, this is an issue for the nd2 repo (if you're able to find a case in which the dask array is inefficient somehow)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants