Allow Apple MPS as GPU device #912

janfb · 2024-01-18T09:38:02Z

Problem

We only support CUDA as GPU devices, but PyTorch 2.1 now supports Apple MPS chips as well (to some extend, see below).

https://pytorch.org/docs/stable/notes/mps.html

Solution

This PR changes the processing of passed device arguments to also allow MPS, e.g., instead of using cuda in the tests, we use gpu and parse the string to mps:0 or cuda:0 accordingly.

Additional comments

there is a problem when using MPS with VIPosterior, q="nsf" and num_dims>1. Sampling q then produces NaNs, see VIPosterior with device="mps" fails for "nsf" #948
PyTorch MPS backend is not implemented for a number of functions, e.g., torch.linalg.cholesky. So it uses CPU as a fallback there
PyTorch MPS only supports float32 and nflows requires float64 (see here: https://github.com/bayesiains/nflows/blob/3b122e5bbc14ed196301969c12d1c2d94fdfba47/nflows/distributions/normal.py#L19-L20)
thus, unless we make a PR in nflows to remove the hard coding of float64 and add an option to set the default float type globally, using MPS will not work.

michaeldeistler · 2024-01-18T10:45:57Z

Re the nflows problem: I am fine with not supporting this, in particular if we will support other density estimators soon.

janfb · 2024-01-24T10:17:50Z

Maybe I misunderstood: you would rather not support MPS devices because we would have to make sure the future density estimators all run with float32?

michaeldeistler · 2024-01-24T11:14:15Z

No, I meant that we do not support MPS devices if nflows is used as backend. We should support MPS devides for other density estimators (which will hopefully use float32).

codecov · 2024-01-25T09:51:12Z

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (f4cebfc) 75.29% compared to head (afc7df2) 76.02%.
Report is 4 commits behind head on main.

Files	Patch %	Lines
sbi/utils/torchutils.py	57.14%	9 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #912      +/-   ##
==========================================
+ Coverage   75.29%   76.02%   +0.73%     
==========================================
  Files          80       80              
  Lines        6286     6319      +33     
==========================================
+ Hits         4733     4804      +71     
+ Misses       1553     1515      -38

Flag	Coverage Δ
unittests	`76.02% <57.14%> (+0.73%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

vivienr · 2024-02-08T09:33:34Z

@janfb thank you for this (and to the SBI team for a very useful software) ! I have been running on SBI on MPS, and saw this PR when looking at creating my own. Just to note in case it's useful, I have been using nflows, with a somewhat ugly wrapping of the density estimator nflows returns:

density_estimator_custom = lambda theta,x: density_estimator_custom_float64(theta,x).to(dtype=torch.float32)

janfb · 2024-02-08T09:44:34Z

Hi @vivienr thanks for your comment! Good to know that you have been using MPS with SBI already!
So I assume you experienced a speed up compared to CPU? How large / what type are your embedding nets?

thanks also for the suggestion, that would indeed work.
We hope to find a more sustainable option soon, e.g., by making a PR in nflows or by adding support for other density estimation packages.

vivienr · 2024-02-08T15:42:50Z

I'm seeing pretty small speed-ups (~10%) with my current test set-up: O(100k) simulations, and my default embedding net is a sequence of dense residual blocks with linear resizing layers. This test case is O(10 layers), input dimension 64, output 16.

But I do need to scale up to my real use-case with a larger embedding network. I'm also limited by MacOS 12 not having several operator supported and falling back on the CPU. I will upgrade to 13 and see if things improve.

janfb · 2024-02-08T15:51:23Z

Thank you for the details, that's good to know. 👍

janfb · 2024-02-13T08:58:27Z

Update:

the float64 in nflows appears only in the buffer of the StandardNormal:

https://github.com/bayesiains/nflows/blob/3b122e5bbc14ed196301969c12d1c2d94fdfba47/nflows/distributions/normal.py#L18-L21

To fix the problem with MPS, I added the option to set the type of that buffer when we are building our flows using nflows:

sbi/sbi/neural_nets/flow.py

Lines 480 to 487 in 189fb75

    
           def get_base_dist( 
        
               num_dims: int, dtype: torch.dtype = torch.float32, **kwargs 
        
           ) -> distributions_.Distribution: 
        
               """Returns the base distribution for the flows with given float type.""" 
        
               base = distributions_.StandardNormal((num_dims,)) 
        
               base._log_z = base._log_z.to(dtype) 
        
               return base

janfb · 2024-02-13T08:58:40Z

@manuelgloeckler

vi_on_gpu tests are failing: for nsf and for num_dim=2 the TransformedDistribution q produces NaN samples. @manuelgloeckler can you please reproduce this on this branch and have a look?
To reproduce, run this command on a MacBook with MPS:
pytest tests/inference_on_device_test.py::test_vi_on_gpu --pdb

sbi/tests/inference_on_device_test.py

Lines 491 to 497 in 189fb75

    
           @pytest.mark.slow 
        
           @pytest.mark.gpu 
        
           @pytest.mark.parametrize("num_dim", (1, 2)) 
        
           @pytest.mark.parametrize("q", ("maf", "nsf", "gaussian_diag", "gaussian", "mcf", "scf")) 
        
           @pytest.mark.parametrize("vi_method", ("rKL", "fKL", "IW", "alpha")) 
        
           @pytest.mark.parametrize("sampling_method", ("naive", "sir")) 
        
           def test_vi_on_gpu(num_dim: int, q: Distribution, vi_method: str, sampling_method: str):

manuelgloeckler · 2024-02-16T11:32:45Z

Well, I do not have a MacBook (nor access to any). I guess I cant test it then.

janfb · 2024-02-16T13:43:11Z

using VIPosterior with MPS and nsf variational family and num_dim>1 results in NaN samples.
Might be related to pytorch/pytorch#89127, thanks @manuelgloeckler

I made a comment in the corresponding vi test.

This is ready for review now.

michaeldeistler

Thanks! Small comment regarding the nsf problems, feel free to merge once it is adressed.

tests/inference_on_device_test.py

janfb added the enhancement New feature or request label Jan 18, 2024

janfb self-assigned this Jan 18, 2024

janfb added this to the Pre Hackathon 2024 milestone Feb 9, 2024

janfb force-pushed the allow-mps-device branch 2 times, most recently from 32c6b3a to 189fb75 Compare February 9, 2024 16:24

janfb marked this pull request as ready for review February 16, 2024 10:21

janfb added architecture Internal changes without API consequences performance Everything related to performance labels Feb 16, 2024

janfb force-pushed the allow-mps-device branch from 189fb75 to 0f7393e Compare February 16, 2024 10:31

manuelgloeckler self-requested a review February 16, 2024 11:08

janfb force-pushed the allow-mps-device branch from 0f7393e to 2591496 Compare February 16, 2024 12:33

add support for apple silicon MPS device

1319529

janfb force-pushed the allow-mps-device branch 2 times, most recently from 9d9b128 to 7a27299 Compare February 16, 2024 13:25

janfb requested a review from michaeldeistler February 16, 2024 13:43

michaeldeistler approved these changes Feb 16, 2024

View reviewed changes

tests/inference_on_device_test.py Show resolved Hide resolved

refactor device tests

afc7df2

janfb force-pushed the allow-mps-device branch from 7a27299 to afc7df2 Compare February 16, 2024 15:16

janfb mentioned this pull request Feb 16, 2024

VIPosterior with device="mps" fails for "nsf" #948

Open

janfb merged commit 2830fda into main Feb 19, 2024
2 of 3 checks passed

janfb deleted the allow-mps-device branch February 19, 2024 09:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow Apple MPS as GPU device #912

Allow Apple MPS as GPU device #912

janfb commented Jan 18, 2024 •

edited

Loading

michaeldeistler commented Jan 18, 2024

janfb commented Jan 24, 2024

michaeldeistler commented Jan 24, 2024

codecov bot commented Jan 25, 2024 •

edited

Loading

vivienr commented Feb 8, 2024

janfb commented Feb 8, 2024

vivienr commented Feb 8, 2024

janfb commented Feb 8, 2024

janfb commented Feb 13, 2024

janfb commented Feb 13, 2024 •

edited

Loading

manuelgloeckler commented Feb 16, 2024

janfb commented Feb 16, 2024

michaeldeistler left a comment

Allow Apple MPS as GPU device #912

Allow Apple MPS as GPU device #912

Conversation

janfb commented Jan 18, 2024 • edited Loading

Problem

Solution

Additional comments

michaeldeistler commented Jan 18, 2024

janfb commented Jan 24, 2024

michaeldeistler commented Jan 24, 2024

codecov bot commented Jan 25, 2024 • edited Loading

Codecov Report

vivienr commented Feb 8, 2024

janfb commented Feb 8, 2024

vivienr commented Feb 8, 2024

janfb commented Feb 8, 2024

janfb commented Feb 13, 2024

janfb commented Feb 13, 2024 • edited Loading

manuelgloeckler commented Feb 16, 2024

janfb commented Feb 16, 2024

michaeldeistler left a comment

Choose a reason for hiding this comment

janfb commented Jan 18, 2024 •

edited

Loading

codecov bot commented Jan 25, 2024 •

edited

Loading

janfb commented Feb 13, 2024 •

edited

Loading