Out of memory: RTX 4090 / 24GB #16

boxabirds · 2024-03-21T11:22:47Z

Hi no matter what movie size I choose -- 5fps, 640x480 I get this error below. nvtop shows that the webUI triggers pre-allocation of 21.5GB but then ... it's not used?

/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] 0%| | 0/15 [00:01<?, ?it/s] Traceback (most recent call last): File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/gradio/queueing.py", line 388, in call_prediction output = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/gradio/route_utils.py", line 219, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/gradio/blocks.py", line 1437, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/gradio/blocks.py", line 1109, in call_function prediction = await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 851, in run result = context.run(func, *args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/gradio/utils.py", line 650, in wrapper response = f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/sambashare/expts/FRESCO/webUI.py", line 159, in process keypath = process1(*args) ^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/sambashare/expts/FRESCO/webUI.py", line 280, in process1 latents = inference(global_state.pipe, global_state.controlnet, global_state.frescoProc, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/sambashare/expts/FRESCO/src/pipe_FRESCO.py", line 201, in inference noise_pred = pipe.unet( ^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/sambashare/expts/FRESCO/src/diffusion_hacked.py", line 776, in forward sample = optimize_feature(sample, flows, occs, correlation_matrix, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/sambashare/expts/FRESCO/src/diffusion_hacked.py", line 485, in optimize_feature optimizer.step(closure) File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/optim/optimizer.py", line 373, in wrapper out = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/optim/optimizer.py", line 76, in _use_grad ret = func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/optim/adam.py", line 143, in step loss = closure() ^^^^^^^^^ File "/home/julian/sambashare/expts/FRESCO/src/diffusion_hacked.py", line 478, in closure loss.backward() File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/_tensor.py", line 492, in backward torch.autograd.backward( File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/autograd/__init__.py", line 251, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.06 GiB. GPU 0 has a total capacty of 23.65 GiB of which 525.94 MiB is free. Process 15481 has 1.27 GiB memory in use. Including non-PyTorch memory, this process has 21.00 GiB memory in use. Of the allocated memory 20.11 GiB is allocated by PyTorch, and 416.71 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The text was updated successfully, but these errors were encountered:

williamyang1991 · 2024-03-21T11:27:54Z

You can set a smaller batch size to avoid OOM.

FRESCO/config/config_music.yaml

Line 17 in 9fe1be7

batch_size: 8

Our method will optimize the feature during DDPM, which reaches a peak memory usage when optimization is applied.

boxabirds · 2024-03-21T11:39:51Z

I tried with batch_size: 4 and then 2 and it made no difference 🤔

I don't think it's this: it's saying it's trying to allocate 112MB, GPU has capacity of 23.65GB but only 106MB is free. 20.85 GiB is allocated by Pytorch. But for what I wonder

boxabirds · 2024-03-21T11:40:58Z

What GPUs did you do your work on? Might it simply be that there is a minimum GPU memory size of 40GB or something?

jinwyp · 2024-03-21T12:26:32Z

There is a bug when batch_size: 4.
Please pull the latest code

#6

boxabirds · 2024-03-21T13:07:25Z

I get the same error when batch size is 2 as well though …?

efwfe · 2024-03-21T13:30:02Z

A10G 24G works fine with batch size = 8

boxabirds · 2024-03-21T13:35:19Z

Great — based on my Stack trace, what am I doing wrong?

…

On Thu, 21 Mar 2024 at 13:30, efwfe ***@***.***> wrote: A10G 24G works fine with batch size = 8 — Reply to this email directly, view it on GitHub <#16 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABD62JFRZGNTVCP2A5S3WTYZLOHBAVCNFSM6AAAAABFBH62QKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJSGMYDCMRZHA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

efwfe · 2024-03-21T13:56:23Z

Great — based on my Stack trace, what am I doing wrong?
…
On Thu, 21 Mar 2024 at 13:30, efwfe @.> wrote: A10G 24G works fine with batch size = 8 — Reply to this email directly, view it on GitHub <#16 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABD62JFRZGNTVCP2A5S3WTYZLOHBAVCNFSM6AAAAABFBH62QKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJSGMYDCMRZHA . You are receiving this because you authored the thread.Message ID: @.>

It's not clear what happened here. Try pulling and using the latest code maybe helpful.

JPW0080 · 2024-03-21T15:17:03Z

Is xformers installed?

boxabirds · 2024-03-21T15:34:13Z

No it wasn’t listed as part of the requirements.

…

On Thu, 21 Mar 2024 at 15:17, JPW0080 ***@***.***> wrote: Is xformers installed? — Reply to this email directly, view it on GitHub <#16 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABD62OOSX5NXNCREOK5OGDYZL2YJAVCNFSM6AAAAABFBH62QKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJSGU4TGMRQHA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

boxabirds · 2024-03-21T17:49:07Z

Great — based on my Stack trace, what am I doing wrong?
…
On Thu, 21 Mar 2024 at 13:30, efwfe @.> wrote: A10G 24G works fine with batch size = 8 — Reply to this email directly, view it on GitHub <#16 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABD62JFRZGNTVCP2A5S3WTYZLOHBAVCNFSM6AAAAABFBH62QKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJSGMYDCMRZHA . You are receiving this because you authored the thread.Message ID: _@**.**_>

It's not clear what happened here. Try pulling and using the latest code maybe helpful.

I checked and this is against the latest code. I don't see any changes in the last 12 hours and my pull was inside that time.

moosl · 2024-03-21T20:56:06Z

I have the same issue here.

williamyang1991 · 2024-03-22T01:32:08Z

Then maybe you could turn off the optmization function to further same memory (but sacrifice performance)?
#14 (comment)

williamyang1991 · 2024-03-22T01:34:20Z

Is xformers installed?

I tried xformers.ops.memory_efficient_attention, but found it is less memory efficient than F.scaled_dot_product_attention
So I didn't use xformers in my code

FRESCO/src/diffusion_hacked.py

Lines 290 to 306 in 9fe1be7

    
                   ''' 
        
                   # for xformers implementation 
        
                   if importlib.util.find_spec("xformers") is not None: 
        
                       hidden_states = xformers.ops.memory_efficient_attention( 
        
                               rearrange(query, "b h d c -> b d h c"), rearrange(key, "b h d c -> b d h c"),  
        
                               rearrange(value, "b h d c -> b d h c"),  
        
                               attn_bias=attention_mask, op=None 
        
                           ) 
        
                       hidden_states = rearrange(hidden_states, "b d h c -> b h d c", h=attn.heads) 
        
                   ''' 
        
                   # the output of sdp = (batch, num_heads, seq_len, head_dim) 
        
                   # TODO: add support for attn.scale when we move to Torch 2.1 
        
                   # output: BC * 8 * HW * D2       
        
                   hidden_states = F.scaled_dot_product_attention( 
        
                       query, key, value, attn_mask=attention_mask, dropout_p=0.0, is_causal=False 
        
                   ) 
        
                   #print('cross: ', GPU.getGPUs()[1].memoryUsed)

boxabirds · 2024-03-22T04:24:37Z

Then maybe you could turn off the optmization function to further same memory (but sacrifice performance)? #14 (comment)

There’s something very strange going on because i#14 is a 12 gig GPU and it works but I have a 24 GB GPU and it won’t do even the most basic processing on a image sequence requiring 112MB. Something’s going on with the pytorch allocation, why does it need 20 gigs of GPU RAM? The only thing I can conclude is #14 is against a different version of the code base.

williamyang1991 · 2024-03-22T04:39:24Z

I think maybe there is no problem with the code.
Maybe there is some specific settings on GPU allocation in your computer that causes the OOM?

boxabirds · 2024-03-22T04:47:40Z

Could be. And - I’ve used this GPU for lots of other tasks and there’s been no issue that I can see. - Also, there’s someone else ***@***.***) commenting they have the same problem. So don’t think it’s specific to my environment. Very happy to turn on extra logging to help figure this out: what’s the best way to do that? Is pytorch allocating 21GB GPU RAM expected behaviour?

…

On Fri, 22 Mar 2024 at 04:39, Shuai Yang ***@***.***> wrote: I think maybe there is no problem with the code. Maybe there is some specific settings on GPU allocation in your computer that causes the OOM? — Reply to this email directly, view it on GitHub <#16 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABD62N57P3SY44INENC7LTYZOYZFAVCNFSM6AAAAABFBH62QKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJUGM2DEMJTGQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

williamyang1991 · 2024-03-22T04:58:24Z

you can print memory usage in diffusion_hacked.py like

print('diffusion_hacked Line 286: ', GPU.getGPUs()[1].memoryUsed)

to see when running which code, OOM happens.

cvespaz · 2024-03-22T20:46:32Z

same issue here, following

cvespaz · 2024-03-22T20:54:42Z

even just running "run keyframes" ooms on a 24gb card? am i missing something here? ran the example test just fine with gradio

williamyang1991 · 2024-03-23T08:42:21Z

even just running "run keyframes" ooms on a 24gb card? am i missing something here? ran the example test just fine with gradio

full frames do not take more memory. Keyframe part uses the most the memory.
You mean the example video work fine but your own video oom?
Maybe your video has too many pixels.
The example video has 512*512 pixels.
If your video is large, you can use smaller resize parameter

FRESCO/run_fresco.py

Line 170 in 9fe1be7

img = resize_image(frame, 512)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of memory: RTX 4090 / 24GB #16

Out of memory: RTX 4090 / 24GB #16

boxabirds commented Mar 21, 2024

williamyang1991 commented Mar 21, 2024

boxabirds commented Mar 21, 2024

boxabirds commented Mar 21, 2024

jinwyp commented Mar 21, 2024

boxabirds commented Mar 21, 2024

efwfe commented Mar 21, 2024

boxabirds commented Mar 21, 2024 via email

efwfe commented Mar 21, 2024

JPW0080 commented Mar 21, 2024

boxabirds commented Mar 21, 2024 via email

boxabirds commented Mar 21, 2024

moosl commented Mar 21, 2024

williamyang1991 commented Mar 22, 2024

williamyang1991 commented Mar 22, 2024

boxabirds commented Mar 22, 2024

williamyang1991 commented Mar 22, 2024

boxabirds commented Mar 22, 2024 via email

williamyang1991 commented Mar 22, 2024

cvespaz commented Mar 22, 2024

cvespaz commented Mar 22, 2024

williamyang1991 commented Mar 23, 2024

Out of memory: RTX 4090 / 24GB #16

Out of memory: RTX 4090 / 24GB #16

Comments

boxabirds commented Mar 21, 2024

williamyang1991 commented Mar 21, 2024

boxabirds commented Mar 21, 2024

boxabirds commented Mar 21, 2024

jinwyp commented Mar 21, 2024

boxabirds commented Mar 21, 2024

efwfe commented Mar 21, 2024

boxabirds commented Mar 21, 2024 via email

efwfe commented Mar 21, 2024

JPW0080 commented Mar 21, 2024

boxabirds commented Mar 21, 2024 via email

boxabirds commented Mar 21, 2024

moosl commented Mar 21, 2024

williamyang1991 commented Mar 22, 2024

williamyang1991 commented Mar 22, 2024

boxabirds commented Mar 22, 2024

williamyang1991 commented Mar 22, 2024

boxabirds commented Mar 22, 2024 via email

williamyang1991 commented Mar 22, 2024

cvespaz commented Mar 22, 2024

cvespaz commented Mar 22, 2024

williamyang1991 commented Mar 23, 2024