Failed to read large file with dask #12

toloudis · 2024-02-20T00:36:30Z

Describe the Bug

Scene shape is:
300, 4, 41, 2048, 2048
. I call

ti = im.get_image_dask_data("TCZYX", T=slice(start_t, end_t))  # start_t = 0, end_t = 4
ti = ti.persist()

And my dask cluster starts out by printing this error

...\.conda\Lib\site-packages\distributed\worker_state_machine.py:3706: UserWarning: ND2File file not closed before garbage collection. Please use `with ND2File(...):` context or call `.close()`.
  instructions = self.state.handle_stimulus(*stims)

and then there's a huge spew of errors while the dask graph does async work.

Expected Behavior

It should "just work".

Environment

windows, conda environment but pip installed. Dask localcluster, 4 workers, 2 threads per worker, about 16GB per worker

The text was updated successfully, but these errors were encountered:

tlambert03 · 2024-02-20T00:43:57Z

since I haven't been following along with the new layout here, can you orient me to exactly what im is here, and the route through which get_image_dask_data would ultimately hit the bioio-nd2 and nd2 packages?

toloudis · 2024-02-20T00:47:17Z

since I haven't been following along with the new layout here, can you orient me to exactly what im is here, and the route through which get_image_dask_data would ultimately hit the bioio-nd2 and nd2 packages?

im is a bioio BioImage which is equivalent to AICSImage. It came from something like im=BioImage(pathToND2File)
The api is basically identical to aicsimageio. Code path should look the same. The reader implementation is the same as the one in aicsimageio as you can see in https://github.com/bioio-devs/bioio-nd2/blob/main/bioio_nd2/reader.py

toloudis · 2024-02-20T00:51:06Z

From the large amount of errors that follow, the lions share of it seems to be repeats of this basic pattern:

  File "c:\Users\danielt\source\repos\aics-int\ome-zarr-conversion\.conda\Lib\site-packages\distributed\protocol\pickle.py", line 94, in loads
    return pickle.loads(x, buffers=buffers)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\danielt\source\repos\aics-int\ome-zarr-conversion\.conda\Lib\site-packages\nd2\nd2file.py", line 229, in __setstate__
    self._rdr = ND2Reader.create(self._path, self._error_radius)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\danielt\source\repos\aics-int\ome-zarr-conversion\.conda\Lib\site-packages\nd2\readers\protocol.py", line 73, in create
    return subcls(path, error_radius=error_radius)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\danielt\source\repos\aics-int\ome-zarr-conversion\.conda\Lib\site-packages\nd2\readers\_modern\modern_reader.py", line 59, in __init__
    super().__init__(path, error_radius)
  File "c:\Users\danielt\source\repos\aics-int\ome-zarr-conversion\.conda\Lib\site-packages\nd2\readers\protocol.py", line 92, in __init__
    self.open()
  File "c:\Users\danielt\source\repos\aics-int\ome-zarr-conversion\.conda\Lib\site-packages\nd2\readers\protocol.py", line 102, in open
    self._mmap = mmap.mmap(self._fh.fileno(), 0, access=mmap.ACCESS_READ)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 1455] The paging file is too small for this operation to complete

tlambert03 · 2024-02-20T00:53:31Z

yeah, at a quick glance, I don't see any places in here where ND2File() is used outside of a context... so the UserWarning: ND2File file not closed before garbage collection you mentioned here is probably just a side effect of something else causing a legitimate crash and resulting in the object getting deleted before the context closes, (and not the real problem). So, let's try to determine what's going on with that paging file error (rather than the warning)...

(and, that's not an error I've seen before)

tlambert03 · 2024-02-20T00:58:03Z

chatgpt suggests "The error message you're encountering, OSError: [WinError 1455] The paging file is too small for this operation to complete, typically occurs under Windows when your application is trying to map a file to memory (using mmap) but the system doesn't have enough virtual memory available to fulfill the request. "

is it possible that your persist() command is trying to load too much?

toloudis · 2024-02-20T01:02:47Z

Yes it's totally possible. I'm trying it with a batch of 1 T instead of 4 (end_t = start_t + 1). It seems to be running. So much for parallelism!

toloudis · 2024-02-20T01:05:05Z

I know dask is doing something good for me overall, but I have no really good way of knowing when I am loading too much data. So you end up having to change batch sizes by trial and error.

tlambert03 · 2024-02-20T01:06:37Z

Yeah, indeed!

toloudis · 2024-02-20T01:08:34Z

@tlambert03 Thanks for taking a peek here. I guess it was nothing more than giving dask too much stuff.

BrianWhitneyAI · 2024-06-20T21:19:44Z

@toloudis @tlambert03 Is this an open issue?

tlambert03 · 2024-06-20T21:20:55Z

don't think so, but @toloudis opened, so he should close

toloudis · 2024-06-21T17:14:50Z

I'm not sure about closing this because I was shown a very large multi-TB nd2 file that hits the same error.. on Windows, we still see crashes in mmap as described above. I have no clue what the fix might be at the moment. The latest errors I got were of this type:

File "c:\Users\danielt\source\repos\aics-int\ome-zarr-conversion\.conda\Lib\site-packages\nd2\readers\protocol.py", line 102, in open
    self._mmap = mmap.mmap(self._fh.fileno(), 0, access=mmap.ACCESS_READ)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 1455] The paging file is too small for this operation to complete

I was going to try using bioformats-reader just to see if it could load without crashing. But I won't be able to do this for at least another week.

@BrianWhitneyAI @SeanLeRoy you might want to talk to Antoine - he can give you some big nd2 files to test. It would be amazing to have this fixed :) I believe they are trying to figure out strategies to generate smaller files but we still are going to be around ~1TB last I heard.

tlambert03 · 2024-06-21T17:17:18Z

the thing is, I'm just not sure you're ever going to fix this? if you use .persist() on the entire dataset as in the original example, you're asking for it to load the entire thing into RAM. what do you imagine a "fix" looking like here?

toloudis · 2024-06-21T19:38:05Z

Even in the original example, we were never intending to load the whole dataset - it's always selecting a subset of T indices.

tlambert03 · 2024-06-21T19:42:00Z

ah, ok. well do let me know if you're able to use bioformats successfully. ultimately, if there is something to be done, this is an issue for the nd2 repo (if you're able to find a case in which the dask array is inefficient somehow)

toloudis added the bug Something isn't working label Feb 20, 2024

toloudis mentioned this issue Feb 20, 2024

bioio-nd2 failing to load from large file tlambert03/nd2#211

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to read large file with dask #12

Failed to read large file with dask #12

toloudis commented Feb 20, 2024

tlambert03 commented Feb 20, 2024

toloudis commented Feb 20, 2024 •

edited

Loading

toloudis commented Feb 20, 2024

tlambert03 commented Feb 20, 2024 •

edited

Loading

tlambert03 commented Feb 20, 2024 •

edited

Loading

toloudis commented Feb 20, 2024

toloudis commented Feb 20, 2024

tlambert03 commented Feb 20, 2024 •

edited

Loading

toloudis commented Feb 20, 2024

BrianWhitneyAI commented Jun 20, 2024

tlambert03 commented Jun 20, 2024

toloudis commented Jun 21, 2024

tlambert03 commented Jun 21, 2024

toloudis commented Jun 21, 2024

tlambert03 commented Jun 21, 2024

Failed to read large file with dask #12

Failed to read large file with dask #12

Comments

toloudis commented Feb 20, 2024

Describe the Bug

Expected Behavior

Environment

tlambert03 commented Feb 20, 2024

toloudis commented Feb 20, 2024 • edited Loading

toloudis commented Feb 20, 2024

tlambert03 commented Feb 20, 2024 • edited Loading

tlambert03 commented Feb 20, 2024 • edited Loading

toloudis commented Feb 20, 2024

toloudis commented Feb 20, 2024

tlambert03 commented Feb 20, 2024 • edited Loading

toloudis commented Feb 20, 2024

BrianWhitneyAI commented Jun 20, 2024

tlambert03 commented Jun 20, 2024

toloudis commented Jun 21, 2024

tlambert03 commented Jun 21, 2024

toloudis commented Jun 21, 2024

tlambert03 commented Jun 21, 2024

toloudis commented Feb 20, 2024 •

edited

Loading

tlambert03 commented Feb 20, 2024 •

edited

Loading

tlambert03 commented Feb 20, 2024 •

edited

Loading

tlambert03 commented Feb 20, 2024 •

edited

Loading