Memory leak on MacOS | M1 OSX #1783

sotrh · 2021-08-09T19:17:08Z

Description
Some users of my tutorial have been experiencing memory issues on my buffer tutorial. It seems to only be an issue on the new M1 chips.

Repro steps
You'll need a mac with and M1 chip, then run the tutorial code here https://github.com/sotrh/learn-wgpu/tree/master/code/beginner/tutorial4-buffer. If you have the repo already downloaded you can just run cargo run --bin tutorial4-buffer.

Expected vs observed behavior
The expected behaviour is no memory leaks on M1

Extra materials
I don't have a Mac, so I can't provide hardware specifics other that it's occurring on M1, but the I'll link the issue from my repo here. sotrh/learn-wgpu#207

Platform
OSX with M1 chip, wgpu 0.9

The text was updated successfully, but these errors were encountered:

kvark · 2021-08-09T20:47:46Z

Similar report in #1537

JunkuiZhang · 2021-08-12T04:30:18Z

On my Intel MacBook Pro it works as expected, I think its a M1 SoC related issue.

kvark · 2021-08-13T15:04:23Z

I don't have an M1. We need help investigating this.

jbatez · 2021-08-13T17:29:11Z

I've been debugging a similar (same?) issue the last couple days. It has something to do with begin_render_pass and high framerates. You can reproduce with wgpu/examples/cube with only a slight modification.

In framework.rs, set target_frametime way above 60hz. On my system, I don't see the issue at 200hz, but I deifnitely see it at 300hz.

Edit: forgot to mention you also need to put the window in fullscreen mode. In windowed mode, the framerate gets clamped.

jbatez · 2021-08-13T17:31:57Z

With the high framerate modification above, if I remove the begin_render_pass call and all the rpass.* method calls, the problem goes away. If I keep the begin_render_pass call but remove all the method calls, the problem persists. Hence the conclusion that it has something to do with begin_render_pass.

jbatez · 2021-08-13T17:37:33Z

Also, note: I first noticed this bug with v0.9 but I still see it in master.

jbatez · 2021-08-13T18:24:43Z

Also maybe noteworthy: with the target_frametime modification above plus a switch to PresentMode::Fifo, the fullscreen framerate reported to stdout reaches well above my display's framerate. Is that expected? If not, maybe the command queue's getting filled faster than the GPU can drain it.

jbatez · 2021-08-13T21:04:22Z

It's not just when the window' fullscreen, but when it's completely obscured. Adding a NSWindowOcclusionStateVisible check before frame acquisition helps, but memory usage still seems to shoot up on occluded -> unoccluded transitions.

pacmanmati · 2021-08-13T21:13:50Z

I've been debugging a similar (same?) issue the last couple days. It has something to do with begin_render_pass and high framerates. You can reproduce with wgpu/examples/cube with only a slight modification.

In framework.rs, set target_framerate way above 60hz. On my system, I don't see the issue at 200hz, but I deifnitely see it at 300hz.

Edit: forgot to mention you also need to put the window in fullscreen mode. In windowed mode, the framerate gets clamped.

what device are you running on? the cube example won't leak memory on my m1 air even when I uncap framerate in framework.rs but the tutorial 3/4 code from sotrh leaks very aggressively on mine.

jbatez · 2021-08-13T21:19:53Z

what device are you running on?

M1 Air for me as well.

pacmanmati · 2021-08-13T21:28:21Z

M1 Air for me as well.

oh interesting. i don't seem to incur any 'leaking' by obscuring one window using another (if that's what you mean) but the process does demand a lot of virtual memory e.g. 5 gigabytes after a few seconds of being on another workspace. i'm now a little hesitant to call my issue a memory leak since the memory seems to be cleaning itself up while the process is running and returning to normal (wgpu master branch).

pacmanmati · 2021-08-13T21:32:39Z

@jbatez could you attach the modified files you used to make the cube example leak? if so i can verify whether mine exhibits similar behaviour. could you also tell me what version of osx you're on? i'm on 11.5.1

scoopr · 2021-08-13T21:33:19Z

My repro is as follows,

Setup the tutorial4-buffer under instruments (needs to be signed with debug entitlement on m1), with leaks and metal profiles added.
Run, and after a while, make the window occluded (behind the instruments window). Bonus for making it visible again and trying to f.ex. resize, and notice that it is hanging for quite a while until it continues.

I also tried on a intel mbp, where I did see a tiny increase in allocations that plateaued, and which drop off to earlier levels after bringing the window back.

The wgpu examples' framework does some frame limiting which I think is hiding this for them. Making the framework unconditionally call the request_redraw without any time checking made it behave similarly with the shadow example in my tests.

The allocations are clearly just calls for nextDrawable. In the shadow example I also saw Queue::write_buffer, but I think I'm just seeing all allocations that are happening in a frame.

I would draw the conclusion that when the window is occluded, nextDrawable isn't really waiting for anything, and for some reason it ends up allocating new drawables instead of waiting on the previous ones to be reused.

In this shadow example trace you can kind of see it

First it is rendering quite normally, then when occluded, it takes a moment and the rendering looks really dense, after which there is a reeeally long pause when I unoccluded the window, and then rendering resumes normally, and looks like memory levels normalise as well!

But I'm not really sure who's bug this is. The behavior seems unexpected, so I would think it is actually an metal/os bug (it should still be adhering to maxDrawableCount?). But if it is by "design" that nextDrawable never waits when window is occluded, is it expected that apps behave sanely and not try to keep drawing? I think it could be fixed on winit side too, by not issuing RedrawRequested when NSWindow occludedState is not visible? Rate limiting in other ways seems to work, but feels a bit hacky :/

Also, googling this I keep ending up in gfx-rs/gfx#2460 :)

jbatez · 2021-08-13T21:35:37Z

oh interesting. i don't seem to incur any 'leaking' by obscuring one window using another (if that's what you mean) but the process does demand a lot of virtual memory e.g. 5 gigabytes after a few seconds of being on another workspace. i'm now a little hesitant to call my issue a memory leak since the memory seems to be cleaning itself up while the process is running and returning to normal (wgpu master branch).

For me it grows until my system locks up and I need to hold the power button to restart. Activity Monitor says ~60 GB when that happens. I only have 8 GB of physical.

could you attach the modified files you used to make the cube example leak? if so i can verify whether mine exhibits similar behaviour.

Sure, 1 minute.

could you also tell me what version of osx you're on? i'm on 11.5.1

11.4

jbatez · 2021-08-13T21:42:17Z

could you attach the modified files you used to make the cube example leak? if so i can verify whether mine exhibits similar behaviour.

See my fork:
https://github.com/jbatez/wgpu

Just run cargo run --example cube, click the green button to enter fullscreen mode, then watch Activity Monitor in another workspace.

jbatez · 2021-08-13T21:44:22Z

Also, just noticed the window obscuring behavior (e.g., just use the Activity Monitor window to completely obscure the cube window) just uncaps the framerate, but doesn't cause the memory usage to run out of control.

jbatez · 2021-08-13T21:54:02Z

Looks like it might have something to do with my screen setup. I'm hooked up to a monitor through a Thunderbolt dock. When I unplug my MacBook and use its builtin display, the problem goes away.

pacmanmati · 2021-08-13T21:54:43Z

See my fork:
https://github.com/jbatez/wgpu

Just run cargo run --example cube, click the green button to enter fullscreen mode, then watch Activity Monitor in another workspace.

thanks for attaching it! this cube example isn't really giving me any issues. for me, the memory usage maybe grows a little when obscuring the window with another but only briefly (no big deal). nothing bad happens on my machine when i maximise the window (regardless of whether i stretch to maximise or maximise onto another workspace).

jbatez · 2021-08-13T21:58:25Z

this cube example isn't really giving me any issues.

Looks like it might have something to do with my screen setup. I'm hooked up to a monitor through a Thunderbolt dock. When I unplug my MacBook and use its builtin display, the problem goes away.

What's your display setup?

pacmanmati · 2021-08-13T21:58:41Z

Looks like it might have something to do with my screen setup. I'm hooked up to a monitor through a Thunderbolt dock. When I unplug my MacBook and use its builtin display, the problem goes away.

could you try running the tutorial code from the original post with your display unplugged? i believe that's the core issue. to reproduce, run the code for tutorial 4 and switch onto a workspace not containing the window. warning: the memory usage spikes in the order of gigabytes after just a few seconds for me. not using any external displays.

jbatez · 2021-08-13T22:08:02Z

Looks like it might have something to do with my screen setup. I'm hooked up to a monitor through a Thunderbolt dock. When I unplug my MacBook and use its builtin display, the problem goes away.

could you try running the tutorial code from the original post with your display unplugged?

Not a problem until I go fullscreen or completely obscure the window. And in this case, obscuring the window does cause the memory usage to shoot up. Same case with or without an external display.

If you don't have any more tests in-mind for me, I'd like to try updating to macOS 11.5.2 and seeing how that behaves.

scoopr · 2021-08-13T22:24:29Z

I tried testing so that winit doesn't emit RedrawRequested if window is occluded, and it seemed to fix the memory increase for me, but the naive implementation had the side effect of winit going basically in a busyloop, attempt to redraw and then skipping it anyway..

jbatez · 2021-08-14T01:44:28Z

fwiw, the Xcode Metal Game template uses MTLCommandBuffer::addCompletedHandler along with a semaphore to keep the CPU from getting more than three frames ahead of the GPU. This Apple developer document describes the strategy under "Manage the Rate of CPU and GPU Work".

In WGPU, I see a call to CommandBufferRef::add_completed_handler, but that's only being used to mark command buffers as complete so their resources can be freed as far as I can tell.

I'm aware the CAMetalLayer::nextDrawable say it's supposed to wait until a drawable is "available", but if that were the case, why would the other examples bother with the semaphore?

porglezomp · 2021-08-31T02:03:35Z

This happens to me on an Intel MacBook on 10.14.6 whenever my game is backgrounded, so it's not just an M1 bug.

In my testing it got significantly worse with WGPU 0.10 (700MB/s leak) vs 0.9 (around 10MB/s) which is how I noticed it.

rileysu · 2021-09-10T12:30:46Z

This happens on my M1 Air too. It only occurs when the window is in a workspace I am not currently using. If I run it without a different workspace the ram usage is significantly lower.

Detecting if the window is not occluded before rendering, like hinted above seems. to fix the problem.

xacrimon · 2021-09-10T17:48:41Z

I think I've figured out what's going on in #1936. Basically in some newer Big Sur versions and especially on the M1 SoC, Metal's AnimationCoreKit will shut off some housekeeping tasks like reclaiming certain resources and buffers when the window isn't in focus. The Apple recommended way to solve this seems to be to stop rendering when NSWindowOccludedState is false.

kvark · 2021-09-10T17:55:09Z

Since we don't control the window visibility, it lies on the user shoulders, if I understand correctly. Is there anything that can be done on our side?

xacrimon · 2021-09-10T17:58:16Z

@kvark We need to expose some should_render() function that tells us if we should render or not the user can conditionally avoid rendering on. This should probably return true on all platforms constantly except metal where it should return NSWindowOccludedState. That means we probably need to expose this though the hal. I am very sure this is the solution but I also need to speak to my contact that works on Metal about this.

The way I figured this out was basically spending the last 6 or so hours inspecting disassembled Apple code blobs and tracing in a debugger so it's shady at best.

xacrimon · 2021-09-10T18:00:12Z

I further confirmed this by stopping rendering when winit issued a WindowEvent::Focused(false) event which corresponds with NSWindowOccludedState in cases albeit returning false even when the window is partially occluded where NSWindowOccludedState. That solved the problem. Let me know if you need any help debugging/testing async or on call @kvark. We should also update tutorials and examples with that function once it gets implemented.

kvark · 2021-12-01T15:47:58Z

I wonder if it's related to https://www.macworld.com/article/549755/m1-macbook-app-memory-leaks-macos.html in any way

Hugo4IT · 2021-12-07T07:48:30Z

I don't think it is, I recently got an M1 MacBook Air (8/256) and haven't experienced any memory issues apart from WGPU (what took 40M on my linux machine takes 300M on me Mac). Though I could be wrong here as the article only talks about the new MacBook Pros.

I don't know in what capacity it would help, but I would like to say that I am willing to test code on my M1 Mac because you said you didn't have one.

Hugo4IT · 2021-12-07T08:17:56Z

I've been playing around with learn-wgpu/tutorial4-buffer.rs a bit, and I have noticed that it uses much less memory when using PresentMode::Immediate and for some reason, with RUST_LOG=info enabled.

parasyte · 2021-12-07T15:01:33Z

My take on the leak, especially considering #1783 (comment) is that the issue is very timing-dependent. (I don't have M1 hardware, just chiming in with outside observations.) More logging will affect timing, as will changing the presentation mode.

kvark · 2021-12-07T15:25:25Z

It's somewhat hard to follow the thread. I see a lot of people investigated this and concluded, apparently, that nextDrawable doesn't block on these platforms if occluded. From here, we have 2 ways:

on the user side, in cooperation with winit, just don't do any rendering if occluded. @pacmanmati says that it doesn't work as simple. Should there be more iteration on this?
make wgpu force CPU waits for GPU work instead of relying on maxDrawableCount.

kvark · 2021-12-07T18:58:00Z

@jbatez the reason this example uses semaphores is not because it wouldn't block otherwise. nextDrawable should still block. But they call it later down the frame, and they need buffers to be available for CPU writes earlier. So semaphores are for these writes only.
However, it's reported in this thread that nextDrawable doesn't block when occluded. Could somebody file an Apple Feedback request about this?

geertbleyen · 2022-01-19T09:54:38Z

We're having the same issue as what @Matt-hde links from.
bevyengine/bevy#3612 (comment)

BGR360 · 2022-02-20T00:18:43Z

Could somebody file an Apple Feedback request about this?

@kvark Could you clarify what you're referring to here?

kvark · 2022-02-24T04:28:14Z

macOS has an application called "Feedback Assistant" which one uses to report issues back to Apple.

botahamec · 2022-11-28T02:57:05Z

I'm having this same problem on Windows with Vulkan. I get a SurfaceError::Outdated on every frame where the window is minimized, and then the memory usage increased. I don't do anything other than log it when this event occurs. Removing the log doesn't solve the issue for me. The occluded trick doesn't work for me, since it's unsupported on Windows.

cwfitzgerald · 2022-11-28T03:12:55Z

That might be a separate issue. You shouldn't be requesting new frames when the window has a size of 0.

botahamec · 2022-11-28T03:38:05Z

@cwfitzgerald That solved my problem. Thank you

williamhCode · 2023-05-20T07:27:14Z

any updates? the issue still persists with v0.16 and the latest winit version (i tried with WindowEvent::Occluded fix).
My biggest question is why Apple did this, shouldn't an app keep running even if it's occluded or out of focus?
Plain old glfw + OpenGL apps on macOS have no problem with this.

williamhCode · 2023-05-25T04:13:05Z

I ran a small program with Google's Dawn. They have the same issue...

Edit: after banging my head over this issue, I think it's safe to say it's Apple's problem.

cwfitzgerald · 2023-11-22T17:47:44Z

Some further investigation:

We leak if the screen is fully hidden
We leak if the window is resized.

I suspect these are cases of "the image never actually gets presented" and the drawable is leaked.

cwfitzgerald · 2023-11-22T17:48:54Z

This doesn't happen when we run on moltenvk, so whatever they're doing we should be doing.

cwfitzgerald · 2023-11-22T18:21:10Z

Take a look here for how to search for leaks on mac https://github.com/gfx-rs/wgpu/wiki/Debugging-wgpu-Applications#mac-leaks

scoopr · 2023-11-22T23:18:02Z

I tested on my intel mac, instrumenting with this:

diff --git a/wgpu-hal/src/metal/mod.rs b/wgpu-hal/src/metal/mod.rs
index 0ddf96ed4..715467fad 100644
--- a/wgpu-hal/src/metal/mod.rs
+++ b/wgpu-hal/src/metal/mod.rs
@@ -35,6 +35,7 @@ use std::{
 use arrayvec::ArrayVec;
 use bitflags::bitflags;
 use metal::foreign_types::ForeignTypeRef as _;
+use objc::{msg_send,sel, sel_impl};
 use parking_lot::{Mutex, RwLock};
 
 #[derive(Clone, Debug)]
@@ -352,6 +353,14 @@ pub struct SurfaceTexture {
     present_with_transaction: bool,
 }
 
+impl Drop for SurfaceTexture {
+    fn drop(&mut self) {
+        let tret :usize = unsafe { msg_send![self.texture.raw, retainCount] };
+        let dret :usize = unsafe { msg_send![self.drawable, retainCount] };
+        eprintln!("drop SurfaceTexture tex.ret={tret} drawable.ret={dret}");
+    }
+}
+
 impl std::borrow::Borrow<Texture> for SurfaceTexture {
     fn borrow(&self) -> &Texture {
         &self.texture

And I see the texture retainCount just rising during normal run, which seems wrong to me?
Perhaps the texture isn't getting dropped like it should? Adding a release in the same Drop seems to keep it at a stable number, but not sure if this is the correct place to drop.

On a quick glance, the release also helped with the memory leak. This didn't fix the odd hang that happens when cmd-tabbing away though, hmm.

Too late to investigate further.

cwfitzgerald · 2023-11-26T22:45:50Z

At long last, this tests as fixed by #4781

kvark added the type: bug Something isn't working label Aug 9, 2021

xiaopengli89 mentioned this issue Oct 19, 2021

Fix memory leak on macOS #2092

Merged

cwfitzgerald mentioned this issue Oct 29, 2021

get_current_texture memory leak on OSX #2126

Closed

FrankenApps mentioned this issue Nov 2, 2021

memory leak in tutorial3-pipeline sotrh/learn-wgpu#260

Closed

Matt-Is-Confused mentioned this issue Jan 9, 2022

MacOS m1 excessive memory usage in pass nodes bevyengine/bevy#3612

Closed

cwfitzgerald mentioned this issue Jun 6, 2022

[Metal] Memory leak in imgui-wgpu-rs #1537

Closed

Nnubes256 mentioned this issue Sep 1, 2022

macOS: Memory usage skyrockets when window is occluded bevyengine/bevy#5856

Closed

xiaopengli89 mentioned this issue Oct 8, 2022

Fix memory leak on macOS #3056

Merged

1 task

williamhCode mentioned this issue Jun 10, 2023

V-Sync disables when window not visible, causing GPU to go to 100% [Metal] bkaradzic/bgfx#2727

Open

Wumpf self-assigned this Nov 22, 2023

cwfitzgerald closed this as completed Nov 26, 2023

Wumpf removed their assignment Nov 27, 2023

Memory leak on MacOS | M1 OSX #1783

Memory leak on MacOS | M1 OSX #1783

Comments

sotrh commented Aug 9, 2021

kvark commented Aug 9, 2021

JunkuiZhang commented Aug 12, 2021

kvark commented Aug 13, 2021

jbatez commented Aug 13, 2021 • edited Loading

jbatez commented Aug 13, 2021

jbatez commented Aug 13, 2021

jbatez commented Aug 13, 2021 • edited Loading

jbatez commented Aug 13, 2021

pacmanmati commented Aug 13, 2021

jbatez commented Aug 13, 2021

pacmanmati commented Aug 13, 2021

pacmanmati commented Aug 13, 2021

scoopr commented Aug 13, 2021 • edited Loading

jbatez commented Aug 13, 2021

jbatez commented Aug 13, 2021

jbatez commented Aug 13, 2021

jbatez commented Aug 13, 2021

pacmanmati commented Aug 13, 2021

jbatez commented Aug 13, 2021

pacmanmati commented Aug 13, 2021

jbatez commented Aug 13, 2021

scoopr commented Aug 13, 2021

jbatez commented Aug 14, 2021

porglezomp commented Aug 31, 2021

rileysu commented Sep 10, 2021 • edited Loading

xacrimon commented Sep 10, 2021 • edited Loading

kvark commented Sep 10, 2021

xacrimon commented Sep 10, 2021 • edited Loading

xacrimon commented Sep 10, 2021 • edited Loading

kvark commented Dec 1, 2021

Hugo4IT commented Dec 7, 2021 • edited Loading

Hugo4IT commented Dec 7, 2021

parasyte commented Dec 7, 2021

kvark commented Dec 7, 2021

kvark commented Dec 7, 2021

geertbleyen commented Jan 19, 2022

BGR360 commented Feb 20, 2022

kvark commented Feb 24, 2022

botahamec commented Nov 28, 2022 • edited Loading

cwfitzgerald commented Nov 28, 2022

botahamec commented Nov 28, 2022

williamhCode commented May 20, 2023 • edited Loading

williamhCode commented May 25, 2023 • edited Loading

cwfitzgerald commented Nov 22, 2023

cwfitzgerald commented Nov 22, 2023

cwfitzgerald commented Nov 22, 2023 • edited Loading

scoopr commented Nov 22, 2023

cwfitzgerald commented Nov 26, 2023

jbatez commented Aug 13, 2021 •

edited

Loading

jbatez commented Aug 13, 2021 •

edited

Loading

scoopr commented Aug 13, 2021 •

edited

Loading

rileysu commented Sep 10, 2021 •

edited

Loading

xacrimon commented Sep 10, 2021 •

edited

Loading

xacrimon commented Sep 10, 2021 •

edited

Loading

xacrimon commented Sep 10, 2021 •

edited

Loading

Hugo4IT commented Dec 7, 2021 •

edited

Loading

botahamec commented Nov 28, 2022 •

edited

Loading

williamhCode commented May 20, 2023 •

edited

Loading

williamhCode commented May 25, 2023 •

edited

Loading

cwfitzgerald commented Nov 22, 2023 •

edited

Loading