High memory usage compared to previous versions #2334

acavelan · 2024-05-08T01:56:36Z

acavelan
May 8, 2024

Context: I am optimizing the parameters of a complex epidemiological individual-based model. I have between 20-25 parameters and 12 objectives.

Last week, I realized I was still using old versions of botorch, gpytorch and torch packages. After upgrading, my workflow is now constantly getting CUDA OOM errors. I thought it was my own mistake but after spending several days on this and not finding a solution I just rolled back to my old versions.

I wrote a sample code to reproduce the problem (see below). The error typically occurs while fitting the model, but I also had OOM errors while optimizing the acquisition function and other GP models. With the latest versions the sample code (KroneckerGP, 2000 samples, 12 outputs) quickly exceeds the 40GB of the A100 GPU I'm using. While my old environment (boorch 0.6.6, torch 1.11, gpytorch 1.6.0) peaks at 3GB usage.

So my question is: are the new versions expected to consume so much more memory? Is there anything I can do to bring memory usage back to 2021 levels?

I have also generated a snapshot.zip of the memory usage(pickle file and interactive html plot) in case that helps. Unfortunately this feature didn't exist in the older versions I'm using.

Any feedback will be greatly appreciated.

Code to reproduce

from botorch.models.multitask import KroneckerMultiTaskGP
from botorch.models.transforms.outcome import Standardize
from botorch.fit import fit_gpytorch_mll_torch
# from botorch.optim.fit import fit_gpytorch_torch
from gpytorch.mlls import ExactMarginalLogLikelihood
from torch.quasirandom import SobolEngine
from torch.optim import Adam
from functools import partial
import torch

from botorch.test_functions.multi_objective import DTLZ1

dtype=torch.double
device="cuda"

N = 2000
D = 21
num_objectives = 12
f = DTLZ1(dim=D, num_objectives=num_objectives, negate=True)
f.bounds[0, :].fill_(0)
f.bounds[1, :].fill_(1)

sobol = SobolEngine(dimension=D, scramble=True)
X = sobol.draw(n=N).to(dtype=dtype, device=device)
Y = f(X)

torch.cuda.memory._record_memory_history(max_entries=10000)

try:
    model = KroneckerMultiTaskGP(X, Y, outcome_transform=Standardize(m=Y.shape[-1]))
    mll = ExactMarginalLogLikelihood(model.likelihood, model)
    fit_gpytorch_mll_torch(mll, step_limit=3000, optimizer=partial(Adam, lr=0.1))
    # fit_gpytorch_torch(mll, options={"maxiter": 3000, "lr": 0.1, "disp": False})
except Exception as e:
    print(e)
    
try:
   torch.cuda.memory._dump_snapshot(f"snapshot.pickle")
except Exception as e:
   logger.error(f"Failed to capture memory snapshot {e}")

torch.cuda.memory._record_memory_history(enabled=None)

New version of packages causing OOM ( exceeds 40GB VRAM on GPU):

mamba list | grep torch
botorch                   0.11.0             pyhd8ed1ab_0    conda-forge
ffmpeg                    4.3                  hf484d3e_0    pytorch
gpytorch                  1.11                          0    gpytorch
jaxtyping                 0.2.9                         0    gpytorch
libjpeg-turbo             2.0.0                h9bf148f_0    pytorch
linear_operator           0.5.1                         0    gpytorch
pytorch                   2.3.0           py3.12_cuda11.8_cudnn8.7.0_0    pytorch
pytorch-cuda              11.8                 h7e8668a_5    pytorch
pytorch-mutex             1.0                        cuda    pytorch
torchaudio                2.3.0               py312_cu118    pytorch
torchvision               0.18.0              py312_cu118    pytorch

Old version of packages causing low memory usage (<3GB VRAM on GPU):

botorch                   0.6.4                         0    pytorch
ffmpeg                    4.3                  hf484d3e_0    pytorch
gpytorch                  1.6.0                         0    gpytorch
jaxtyping                 0.2.9                         0    gpytorch
linear_operator           0.5.2                         0    gpytorch
pytorch                   1.11.0          py3.9_cuda11.3_cudnn8.2.0_0    pytorch
pytorch-cuda              11.7                 h778d358_5    pytorch
pytorch-mutex             1.0                        cuda    pytorch
torchaudio                0.11.0               py39_cu113    pytorch
torchvision               0.12.0               py39_cu113    pytorch

saitcakmak · 2024-05-15T17:52:11Z

saitcakmak
May 15, 2024
Collaborator

Hi @acavelan. One change we have made that could contribute to this is to disable some approximate math used in GPyTorch by default: https://github.com/pytorch/botorch/blob/main/botorch/__init__.py#L36-L49 Since you're working with large tensors, these were likely active in the old versions and disabling them may have increased the memory usage.

I tried to the repro you provided (many thanks!) with a V100 32GB. It got OOM with the BoTorch defaults but ran fine after reverting to GPyTorch defaults with

import gpytorch.settings as gp_settings
import linear_operator.settings as linop_settings

linop_settings._fast_covar_root_decomposition._default = True
linop_settings._fast_log_prob._default = True
linop_settings._fast_solves._default = True
linop_settings.cholesky_max_tries._global_value = 6
linop_settings.max_cholesky_size._global_value = 800
gp_settings.max_eager_kernel_size._global_value = 800

1 reply

acavelan May 16, 2024
Author

Thank you, this is very helpful! It turns out I only need these two lines:

linop_settings._fast_log_prob._default = True
linop_settings._fast_solves._default = True

acavelan · 2024-05-21T13:58:07Z

acavelan
May 21, 2024
Author

@saitcakmak I marked this as resolved but I still ran out of memory when I increased the number of samples. The new version appears to be much faster but it is also consuming 5-10x more memory. The flags above are helping a lot but I am still missing something. I don't know if I should open a new issue. Any ideas?

4 replies

saitcakmak May 21, 2024
Collaborator

I looked through the code base and there are lots of minor changes to KroneckerMTGP over the last couple years (& some other changes on GPyTorch end). Many of these changes are side effects of larger refactors or some deprecated functionality that has been replaced with a newer method. It is hard to say what change would have contributed to the memory increase without trying out other BoTorch & GPyTorch versions in between 0.6.4 & 0.11.0 and observing the memory usage.

acavelan May 21, 2024
Author

Thanks for looking into it. It's not just the KroneckerMTGP though, I have the same problem with other models, especially with multiple objectives. I also have OOM errors in optimize_acqf or when using Thompson sampling, so it's probably something in GPyTorch? I don't have time to test each version now, but if I do I will report back.

acavelan May 24, 2024
Author

@saitcakmak The issue appears in BoTorch 0.7.2. It's independent from GPyTorch, PyTorch and Torch.

Below is the environment I'm using (Python 3.9):

botorch                   0.7.2              pyhd8ed1ab_0    conda-forge
ffmpeg                    4.3                  hf484d3e_0    pytorch
gpytorch                  1.11                          0    gpytorch
linear_operator           0.5.1                         0    gpytorch
pytorch                   1.13.1          py3.9_cuda11.7_cudnn8.5.0_0    pytorch
pytorch-cuda              11.7                 h778d358_5    pytorch

With the same environment and BoTorch 0.7.1 I have no problem with memory usage.

I now suspect the cause is this commit. I see that approx_mll is switched to False by default but I tried both True and False in version 0.7.2 and it didn't seem to affect memory too much (it's possible I didn't pass the optimizer options correctly).

Could the problem be due to a different default optimizer or optimizer options? If so, switching to another optimizer that uses less memory might be a solution?

Test code:

from botorch.models import SingleTaskGP
from botorch.models.multitask import KroneckerMultiTaskGP

from botorch.generation import MaxPosteriorSampling
from botorch.models.transforms.outcome import Standardize
from botorch.fit import fit_gpytorch_model
from botorch.optim.fit import fit_gpytorch_torch, fit_gpytorch_scipy
from botorch.optim import optimize_acqf
from botorch.acquisition.objective import GenericMCObjective
from botorch.test_functions.multi_objective import DTLZ1
from gpytorch.mlls import ExactMarginalLogLikelihood

import torch
from torch.quasirandom import SobolEngine

import gc

dtype = torch.double
device = "cuda"

N = 4000
D = 21
num_objectives = 12

f = DTLZ1(dim=D, num_objectives=num_objectives, negate=True)
f.bounds[0, :].fill_(0)
f.bounds[1, :].fill_(1)

sobol = SobolEngine(dimension=D, scramble=False, seed=1)
X = sobol.draw(n=N).to(dtype=dtype, device=device)
Y = f(X).to(X)

objective = GenericMCObjective(lambda Y, X: Y.sum(dim=-1))

gc.collect()
torch.cuda.empty_cache()

torch.manual_seed(seed=0) # to keep the restart conditions the same

model = SingleTaskGP(X, Y, outcome_transform=Standardize(m=Y.shape[-1]))
# model = KroneckerMultiTaskGP(X, Y, outcome_transform=Standardize(m=Y.shape[-1]))
mll = ExactMarginalLogLikelihood(model.likelihood, model)

fit_gpytorch_model(mll)
# fit_gpytorch_torch(mll)#, options={"maxiter": 3000, "lr": 0.01, "disp": False})
# fit_gpytorch_scipy(mll)#, options={"maxiter": 3000, "lr": 0.01, "disp": False, "approx_mll": False})

t = torch.cuda.get_device_properties(0).total_memory / (2**30)
r = torch.cuda.max_memory_reserved(0) / (2**30)
a = torch.cuda.max_memory_allocated(0) / (2**30)

print(f"Total: {t:2.2f}Gi, Reserved: {r:2.2f}Gi, Allocated: {a:2.2f}Gi")

saitcakmak May 30, 2024
Collaborator

Thanks for digging into this further. That commit makes lots of changes to model fitting logic, including how the MLL is evaluated under the hood.

I see that approx_mll is switched to False by default but I tried both True and False

This was just affecting the behavior of a GPyTorch context manager: with gpt_settings.fast_computations(log_prob=approx_mll). In my testing, setting it to False leads to model fitting taking much longer, though doesn't seem to affect memory usage. This should have the same behavior as linop_settings._fast_log_prob._default = True setting we discussed above.

cc @esantorella, @Balandat in case you're aware of any changes from fit_gpytorch_mll refactor that could've affected memory usage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High memory usage compared to previous versions #2334

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

High memory usage compared to previous versions #2334

acavelan May 8, 2024

Replies: 2 comments · 5 replies

saitcakmak May 15, 2024 Collaborator

acavelan May 16, 2024 Author

acavelan May 21, 2024 Author

saitcakmak May 21, 2024 Collaborator

acavelan May 21, 2024 Author

acavelan May 24, 2024 Author

saitcakmak May 30, 2024 Collaborator

acavelan
May 8, 2024

Replies: 2 comments 5 replies

saitcakmak
May 15, 2024
Collaborator

acavelan May 16, 2024
Author

acavelan
May 21, 2024
Author

saitcakmak May 21, 2024
Collaborator

acavelan May 21, 2024
Author

acavelan May 24, 2024
Author

saitcakmak May 30, 2024
Collaborator