Replies: 2 comments 5 replies
-
Hi @acavelan. One change we have made that could contribute to this is to disable some approximate math used in GPyTorch by default: https://github.com/pytorch/botorch/blob/main/botorch/__init__.py#L36-L49 Since you're working with large tensors, these were likely active in the old versions and disabling them may have increased the memory usage. I tried to the repro you provided (many thanks!) with a V100 32GB. It got OOM with the BoTorch defaults but ran fine after reverting to GPyTorch defaults with
|
Beta Was this translation helpful? Give feedback.
-
@saitcakmak I marked this as resolved but I still ran out of memory when I increased the number of samples. The new version appears to be much faster but it is also consuming 5-10x more memory. The flags above are helping a lot but I am still missing something. I don't know if I should open a new issue. Any ideas? |
Beta Was this translation helpful? Give feedback.
-
Context: I am optimizing the parameters of a complex epidemiological individual-based model. I have between 20-25 parameters and 12 objectives.
Last week, I realized I was still using old versions of botorch, gpytorch and torch packages. After upgrading, my workflow is now constantly getting CUDA OOM errors. I thought it was my own mistake but after spending several days on this and not finding a solution I just rolled back to my old versions.
I wrote a sample code to reproduce the problem (see below). The error typically occurs while fitting the model, but I also had OOM errors while optimizing the acquisition function and other GP models. With the latest versions the sample code (KroneckerGP, 2000 samples, 12 outputs) quickly exceeds the 40GB of the A100 GPU I'm using. While my old environment (boorch 0.6.6, torch 1.11, gpytorch 1.6.0) peaks at 3GB usage.
So my question is: are the new versions expected to consume so much more memory? Is there anything I can do to bring memory usage back to 2021 levels?
I have also generated a snapshot.zip of the memory usage(pickle file and interactive html plot) in case that helps. Unfortunately this feature didn't exist in the older versions I'm using.
Any feedback will be greatly appreciated.
Code to reproduce
New version of packages causing OOM ( exceeds 40GB VRAM on GPU):
Old version of packages causing low memory usage (<3GB VRAM on GPU):
Beta Was this translation helpful? Give feedback.
All reactions