Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Cellpose on macOS using the MPS Backend #1063

Open
Vijayishwerj opened this issue Nov 23, 2024 · 2 comments
Open

Issues with Cellpose on macOS using the MPS Backend #1063

Vijayishwerj opened this issue Nov 23, 2024 · 2 comments
Labels
install install help

Comments

@Vijayishwerj
Copy link

Hello Cellpose Team,

I am experiencing several issues while using Cellpose for segmentation and training models on my Mac Studio M2 Ultra. I would appreciate your guidance to resolve these problems. Below are the details:

Environment Information

•	Operating System: macOS
•	Device: Mac Studio M2 Ultra
•	Python Version: 3.11.10
•	Cellpose Version: 3.1.0
•	Torch Version: 2.5.1
•	Backend: MPS (Metal Performance Shaders)

Issues Faced

1.	GPU Incompatibility with Sparse Tensor Operations:
•	Error: NotImplementedError: Could not run 'aten::_sparse_coo_tensor_with_dims_and_tensors' with arguments from the 'SparseMPS' backend....
•	This suggests that the MPS backend lacks support for sparse tensor operations, leading to failures during segmentation.
2.	Fallback to CPU:
•	When the MPS backend fails, computation falls back to the CPU.
•	However, warnings about the missing MKL optimizations slow down performance significantly:  WARNING: MKL version on torch not working/installed - CPU version will be slightly slower.


3.	GPU Training Issues:
•	The latest version of Cellpose mandates GPU use for training. However, the MPS backend does not complete tasks, and training without a GPU seems impossible.
4.	Performance Bottleneck:
•	The fallback to CPU leads to extremely long training times, making it impractical for large datasets.

Run Logs

Attached below is the terminal output with verbose mode enabled, showing the errors and relevant information:
• MPS backend available: torch.backends.mps.is_available() returns True.
• The operation fails during sparse tensor computation (aten::_sparse_coo_tensor_with_dims_and_tensors).

NotImplementedError: Could not run 'aten::_sparse_coo_tensor_with_dims_and_tensors' with arguments from the 'SparseMPS' backend. ####

1.	Verified that the MPS backend is correctly installed and available.
2.	Updated to the latest versions of Cellpose, Python, and PyTorch.
3.	Attempted fallback to the CPU but encountered performance issues due to lack of MKL support.
4.	Explored replacing sparse operations with dense ones but encountered compatibility constraints.

Request for Assistance

1.	GPU Support:
•	Are there plans to improve sparse tensor compatibility for the MPS backend in future versions?
@Vijayishwerj Vijayishwerj added the install install help label Nov 23, 2024
@sophiamaedler
Copy link
Contributor

sophiamaedler commented Dec 5, 2024

I've looked into this a little and there seem to be two breaking changes that prevent running cellpose >= 3.1 on an MPS backend:

  1. currently no PyTorch support for sparse operations (see here: MPS Sparse Support pytorch/pytorch#129842)
  2. apple GPUs are only single-precision so they do not and probably will never support torch.double/torch.float64 operations.

Regarding 1:
I would hope that at some point PyTorch will release MPS support for sparse operations. Until then I am not sure how much work it would be to implement some workaround for MPS or if the solution would be to limit MPS use to cellpose 3.0?

Regarding 2:
I found a few occurrences of torch.float64 and torch.double:
torch.double: cellpose/dynamics.py
torch.float64: cellpose/dynamics.py

I guess it would be fairly straightforward to add checks here for the MPS backend and ensure that at most torch.float32 is used. I am not sure if this would have any impact on the generated results though. Maybe @carsen-stringer could comment on if replacing occurrences of torch.double/torch.float64 would have any negative consequences . If not I'd be happy to make a PR.

In addition the switch from
mu /= (1e-20 + (mu**2).sum(axis=0)**0.5)
to
mu /= (1e-60 + (mu**2).sum(axis=0)**0.5)
results in RuntimeWarnings on MacOs:

"RuntimeWarning: invalid value encountered in divide
mu /= (1e-60 + (mu**2).sum(axis=0)**0.5)"

My guess would be that this is a direct result of processes running on an MPS backend using float32 and not float64 which results in zero-like values being created. Was there a concrete rational for implementing this switch? What would the effects of leaving it at 1e-20 vs ignoring the warning?

@OratHelm
Copy link
Contributor

OratHelm commented Dec 6, 2024

I think this problem has already been mentioned here: #1034.
A quick workaround for single-precision has been added in version 3.0.11 of Cellpose, which allows compatibility with the MPS backend. But it can certainly be improved!
Sparse operations have been added in subsequent versions of Cellpose. While waiting for a fix to avoid using them with Apple GPUs or for pytorch to implement these functions for MPS, you can go back to version 3.0.11: pip install git+https://github.com/mouseland/[email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
install install help
Projects
None yet
Development

No branches or pull requests

3 participants