Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to membrane-image definition #31

Open
FloWuenne opened this issue Sep 16, 2022 · 8 comments
Open

Changes to membrane-image definition #31

FloWuenne opened this issue Sep 16, 2022 · 8 comments

Comments

@FloWuenne
Copy link

I just want to make sure this gets seen, so I opened a new issue, since its not completely related to the previous one.

As mentioned in #13 , I would like the way that --membrane-image is defined and used by default to be adjusted.

Thanks for considering this!

Cheers

@ngreenwald
Copy link
Contributor

Hey @FloWuenne, is this a convenience change or a feature change? From my understanding, you can do what you're suggesting already by passing the same path to the nuclear and membrane channel. Is your proposed change so that the use only has to pass a single file path if they have a single multichannel image?

@FloWuenne
Copy link
Author

Thanks for the reply @ngreenwald !

Exactly, the proposed change is to be able to only pass a single, multichannel image to mesmer and just specify the nuclear and membrane channels. This has two main reasons:

  1. File channeling in MCMICRO / nextflow generally expects 1 multichannel image file for other segmentation methods and thus would be more in line with this.
  2. Having only one multi-channel file and using different channels seems (at least to me) more in line with the whole idea of multiplexed images as well. Also, if unknowing users try and pass large, multichannel images to both nuclear and membrane channels, they might easily encounter out-of-memory issues since both images have to be read...

I will tag @ArtemSokolov here, as he is the one mainly implementing Mesmer in MCMICRO and he can add any other reasons I might have missed.

@ArtemSokolov
Copy link

I think the main concern is the same image getting loaded twice, thus doubling the memory footprint (which would be problematic for very large whole-slide images).

@ngreenwald
Copy link
Contributor

Got it, that makes sense. We have a pretty big backlog here, but if you guys want to fork the repo and open a PR, I'd be happy to review it.

The loading logic is here: https://github.com/vanvalenlab/deepcell-applications/blob/master/deepcell_applications/prepare.py#L41

@marcovarrone
Copy link

From the way it's currently implemented, since it's already creating an array of zeros with the same dtype as the nuclear image, the memory footprint should not change.
Or am I missing something?

@FloWuenne
Copy link
Author

I think as @ArtemSokolov pointed out, the main concern is that if the nuclear image and the membrane image are both part of multi-channel, large image, the current implementation would read both stacks in memory completely
via get_image() before extracting the channel for segmentation. This is pretty memory inefficient.

One solution could be to only read the required channel into memory from the start (tricky part might be to know beforehand which axis is the channel). I am sorry I didn't have time to play around with this yet. We have put a workaround into MCMICRO that extract the required channel before applying Mesmer.

@ngreenwald
Copy link
Contributor

I suspect that @FloWuenne's workaround is actually better.

Currently, if the image data is stored as a single 40-channel image, the current pipeline will read the entire thing into memory twice, once for each compartment.

The proposed fix in this issue would reduce this to a single read of the entire image, which is half the footprint. However, the ideal scenario would be to only read the relevant channels from the multitiff, potentially reducing memory by 90% for large images.

Only reading the relevant tiffs from the stack would require a bit more work, but subsetting in advance and just passing that to Mesmer will likely result in the largest savings.

Is that an accurate summary of where things stand?

@FloWuenne
Copy link
Author

Sorry for only responding now @ngreenwald . You summarized it well. If you are planning to implement this at some point, keep this issue open, otherwise please feel free to close it. I unfortunately don't have bandwith to make a PR to add this improvement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants