Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak on MacOS | M1 OSX #1783

Closed
sotrh opened this issue Aug 9, 2021 · 54 comments
Closed

Memory leak on MacOS | M1 OSX #1783

sotrh opened this issue Aug 9, 2021 · 54 comments
Labels
type: bug Something isn't working

Comments

@sotrh
Copy link

sotrh commented Aug 9, 2021

Description
Some users of my tutorial have been experiencing memory issues on my buffer tutorial. It seems to only be an issue on the new M1 chips.

Repro steps
You'll need a mac with and M1 chip, then run the tutorial code here https://github.com/sotrh/learn-wgpu/tree/master/code/beginner/tutorial4-buffer. If you have the repo already downloaded you can just run cargo run --bin tutorial4-buffer.

Expected vs observed behavior
The expected behaviour is no memory leaks on M1

Extra materials
I don't have a Mac, so I can't provide hardware specifics other that it's occurring on M1, but the I'll link the issue from my repo here. sotrh/learn-wgpu#207

Platform
OSX with M1 chip, wgpu 0.9

@kvark
Copy link
Member

kvark commented Aug 9, 2021

Similar report in #1537

@kvark kvark added the type: bug Something isn't working label Aug 9, 2021
@JunkuiZhang
Copy link
Contributor

On my Intel MacBook Pro it works as expected, I think its a M1 SoC related issue.

@kvark
Copy link
Member

kvark commented Aug 13, 2021

I don't have an M1. We need help investigating this.

@jbatez
Copy link

jbatez commented Aug 13, 2021

I've been debugging a similar (same?) issue the last couple days. It has something to do with begin_render_pass and high framerates. You can reproduce with wgpu/examples/cube with only a slight modification.

In framework.rs, set target_frametime way above 60hz. On my system, I don't see the issue at 200hz, but I deifnitely see it at 300hz.

Edit: forgot to mention you also need to put the window in fullscreen mode. In windowed mode, the framerate gets clamped.

@jbatez
Copy link

jbatez commented Aug 13, 2021

With the high framerate modification above, if I remove the begin_render_pass call and all the rpass.* method calls, the problem goes away. If I keep the begin_render_pass call but remove all the method calls, the problem persists. Hence the conclusion that it has something to do with begin_render_pass.

@jbatez
Copy link

jbatez commented Aug 13, 2021

Also, note: I first noticed this bug with v0.9 but I still see it in master.

@jbatez
Copy link

jbatez commented Aug 13, 2021

Also maybe noteworthy: with the target_frametime modification above plus a switch to PresentMode::Fifo, the fullscreen framerate reported to stdout reaches well above my display's framerate. Is that expected? If not, maybe the command queue's getting filled faster than the GPU can drain it.

@jbatez
Copy link

jbatez commented Aug 13, 2021

It's not just when the window' fullscreen, but when it's completely obscured. Adding a NSWindowOcclusionStateVisible check before frame acquisition helps, but memory usage still seems to shoot up on occluded -> unoccluded transitions.

@pacmanmati
Copy link

I've been debugging a similar (same?) issue the last couple days. It has something to do with begin_render_pass and high framerates. You can reproduce with wgpu/examples/cube with only a slight modification.

In framework.rs, set target_framerate way above 60hz. On my system, I don't see the issue at 200hz, but I deifnitely see it at 300hz.

Edit: forgot to mention you also need to put the window in fullscreen mode. In windowed mode, the framerate gets clamped.

what device are you running on? the cube example won't leak memory on my m1 air even when I uncap framerate in framework.rs but the tutorial 3/4 code from sotrh leaks very aggressively on mine.

@jbatez
Copy link

jbatez commented Aug 13, 2021

what device are you running on?

M1 Air for me as well.

@pacmanmati
Copy link

M1 Air for me as well.

oh interesting. i don't seem to incur any 'leaking' by obscuring one window using another (if that's what you mean) but the process does demand a lot of virtual memory e.g. 5 gigabytes after a few seconds of being on another workspace. i'm now a little hesitant to call my issue a memory leak since the memory seems to be cleaning itself up while the process is running and returning to normal (wgpu master branch).

@pacmanmati
Copy link

@jbatez could you attach the modified files you used to make the cube example leak? if so i can verify whether mine exhibits similar behaviour. could you also tell me what version of osx you're on? i'm on 11.5.1

@scoopr
Copy link
Contributor

scoopr commented Aug 13, 2021

My repro is as follows,

  1. Setup the tutorial4-buffer under instruments (needs to be signed with debug entitlement on m1), with leaks and metal profiles added.
  2. Run, and after a while, make the window occluded (behind the instruments window). Bonus for making it visible again and trying to f.ex. resize, and notice that it is hanging for quite a while until it continues.

I also tried on a intel mbp, where I did see a tiny increase in allocations that plateaued, and which drop off to earlier levels after bringing the window back.

The wgpu examples' framework does some frame limiting which I think is hiding this for them. Making the framework unconditionally call the request_redraw without any time checking made it behave similarly with the shadow example in my tests.

Screenshot 2021-08-13 at 23 29 17

The allocations are clearly just calls for nextDrawable. In the shadow example I also saw Queue::write_buffer, but I think I'm just seeing all allocations that are happening in a frame.

I would draw the conclusion that when the window is occluded, nextDrawable isn't really waiting for anything, and for some reason it ends up allocating new drawables instead of waiting on the previous ones to be reused.

In this shadow example trace you can kind of see it
Screenshot 2021-08-14 at 0 28 35
Screenshot 2021-08-14 at 0 28 50
First it is rendering quite normally, then when occluded, it takes a moment and the rendering looks really dense, after which there is a reeeally long pause when I unoccluded the window, and then rendering resumes normally, and looks like memory levels normalise as well!

But I'm not really sure who's bug this is. The behavior seems unexpected, so I would think it is actually an metal/os bug (it should still be adhering to maxDrawableCount?). But if it is by "design" that nextDrawable never waits when window is occluded, is it expected that apps behave sanely and not try to keep drawing? I think it could be fixed on winit side too, by not issuing RedrawRequested when NSWindow occludedState is not visible? Rate limiting in other ways seems to work, but feels a bit hacky :/

Also, googling this I keep ending up in gfx-rs/gfx#2460 :)

@jbatez
Copy link

jbatez commented Aug 13, 2021

oh interesting. i don't seem to incur any 'leaking' by obscuring one window using another (if that's what you mean) but the process does demand a lot of virtual memory e.g. 5 gigabytes after a few seconds of being on another workspace. i'm now a little hesitant to call my issue a memory leak since the memory seems to be cleaning itself up while the process is running and returning to normal (wgpu master branch).

For me it grows until my system locks up and I need to hold the power button to restart. Activity Monitor says ~60 GB when that happens. I only have 8 GB of physical.

could you attach the modified files you used to make the cube example leak? if so i can verify whether mine exhibits similar behaviour.

Sure, 1 minute.

could you also tell me what version of osx you're on? i'm on 11.5.1

11.4

@jbatez
Copy link

jbatez commented Aug 13, 2021

could you attach the modified files you used to make the cube example leak? if so i can verify whether mine exhibits similar behaviour.

See my fork:
https://github.com/jbatez/wgpu

Just run cargo run --example cube, click the green button to enter fullscreen mode, then watch Activity Monitor in another workspace.

@jbatez
Copy link

jbatez commented Aug 13, 2021

Also, just noticed the window obscuring behavior (e.g., just use the Activity Monitor window to completely obscure the cube window) just uncaps the framerate, but doesn't cause the memory usage to run out of control.

@jbatez
Copy link

jbatez commented Aug 13, 2021

Looks like it might have something to do with my screen setup. I'm hooked up to a monitor through a Thunderbolt dock. When I unplug my MacBook and use its builtin display, the problem goes away.

@pacmanmati
Copy link

See my fork:
https://github.com/jbatez/wgpu

Just run cargo run --example cube, click the green button to enter fullscreen mode, then watch Activity Monitor in another workspace.

thanks for attaching it! this cube example isn't really giving me any issues. for me, the memory usage maybe grows a little when obscuring the window with another but only briefly (no big deal). nothing bad happens on my machine when i maximise the window (regardless of whether i stretch to maximise or maximise onto another workspace).

@jbatez
Copy link

jbatez commented Aug 13, 2021

this cube example isn't really giving me any issues.

Looks like it might have something to do with my screen setup. I'm hooked up to a monitor through a Thunderbolt dock. When I unplug my MacBook and use its builtin display, the problem goes away.

What's your display setup?

@pacmanmati
Copy link

Looks like it might have something to do with my screen setup. I'm hooked up to a monitor through a Thunderbolt dock. When I unplug my MacBook and use its builtin display, the problem goes away.

could you try running the tutorial code from the original post with your display unplugged? i believe that's the core issue. to reproduce, run the code for tutorial 4 and switch onto a workspace not containing the window. warning: the memory usage spikes in the order of gigabytes after just a few seconds for me. not using any external displays.

@jbatez
Copy link

jbatez commented Aug 13, 2021

Looks like it might have something to do with my screen setup. I'm hooked up to a monitor through a Thunderbolt dock. When I unplug my MacBook and use its builtin display, the problem goes away.

could you try running the tutorial code from the original post with your display unplugged?

Not a problem until I go fullscreen or completely obscure the window. And in this case, obscuring the window does cause the memory usage to shoot up. Same case with or without an external display.

If you don't have any more tests in-mind for me, I'd like to try updating to macOS 11.5.2 and seeing how that behaves.

@scoopr
Copy link
Contributor

scoopr commented Aug 13, 2021

I tried testing so that winit doesn't emit RedrawRequested if window is occluded, and it seemed to fix the memory increase for me, but the naive implementation had the side effect of winit going basically in a busyloop, attempt to redraw and then skipping it anyway..

@jbatez
Copy link

jbatez commented Aug 14, 2021

fwiw, the Xcode Metal Game template uses MTLCommandBuffer::addCompletedHandler along with a semaphore to keep the CPU from getting more than three frames ahead of the GPU. This Apple developer document describes the strategy under "Manage the Rate of CPU and GPU Work".

In WGPU, I see a call to CommandBufferRef::add_completed_handler, but that's only being used to mark command buffers as complete so their resources can be freed as far as I can tell.

I'm aware the CAMetalLayer::nextDrawable say it's supposed to wait until a drawable is "available", but if that were the case, why would the other examples bother with the semaphore?

@porglezomp
Copy link

This happens to me on an Intel MacBook on 10.14.6 whenever my game is backgrounded, so it's not just an M1 bug.

In my testing it got significantly worse with WGPU 0.10 (700MB/s leak) vs 0.9 (around 10MB/s) which is how I noticed it.

@rileysu
Copy link

rileysu commented Sep 10, 2021

This happens on my M1 Air too. It only occurs when the window is in a workspace I am not currently using. If I run it without a different workspace the ram usage is significantly lower.

Detecting if the window is not occluded before rendering, like hinted above seems. to fix the problem.

@xacrimon
Copy link
Contributor

xacrimon commented Sep 10, 2021

I think I've figured out what's going on in #1936. Basically in some newer Big Sur versions and especially on the M1 SoC, Metal's AnimationCoreKit will shut off some housekeeping tasks like reclaiming certain resources and buffers when the window isn't in focus. The Apple recommended way to solve this seems to be to stop rendering when NSWindowOccludedState is false.

@kvark
Copy link
Member

kvark commented Sep 10, 2021

Since we don't control the window visibility, it lies on the user shoulders, if I understand correctly. Is there anything that can be done on our side?

@xacrimon
Copy link
Contributor

xacrimon commented Sep 10, 2021

@kvark We need to expose some should_render() function that tells us if we should render or not the user can conditionally avoid rendering on. This should probably return true on all platforms constantly except metal where it should return NSWindowOccludedState. That means we probably need to expose this though the hal. I am very sure this is the solution but I also need to speak to my contact that works on Metal about this.

The way I figured this out was basically spending the last 6 or so hours inspecting disassembled Apple code blobs and tracing in a debugger so it's shady at best.

@xacrimon
Copy link
Contributor

xacrimon commented Sep 10, 2021

I further confirmed this by stopping rendering when winit issued a WindowEvent::Focused(false) event which corresponds with NSWindowOccludedState in cases albeit returning false even when the window is partially occluded where NSWindowOccludedState. That solved the problem. Let me know if you need any help debugging/testing async or on call @kvark. We should also update tutorials and examples with that function once it gets implemented.

@kvark
Copy link
Member

kvark commented Dec 1, 2021

I wonder if it's related to https://www.macworld.com/article/549755/m1-macbook-app-memory-leaks-macos.html in any way

@Hugo4IT
Copy link

Hugo4IT commented Dec 7, 2021

I don't think it is, I recently got an M1 MacBook Air (8/256) and haven't experienced any memory issues apart from WGPU (what took 40M on my linux machine takes 300M on me Mac). Though I could be wrong here as the article only talks about the new MacBook Pros.

I don't know in what capacity it would help, but I would like to say that I am willing to test code on my M1 Mac because you said you didn't have one.

@Hugo4IT
Copy link

Hugo4IT commented Dec 7, 2021

I've been playing around with learn-wgpu/tutorial4-buffer.rs a bit, and I have noticed that it uses much less memory when using PresentMode::Immediate and for some reason, with RUST_LOG=info enabled.

@parasyte
Copy link
Contributor

parasyte commented Dec 7, 2021

My take on the leak, especially considering #1783 (comment) is that the issue is very timing-dependent. (I don't have M1 hardware, just chiming in with outside observations.) More logging will affect timing, as will changing the presentation mode.

@kvark
Copy link
Member

kvark commented Dec 7, 2021

It's somewhat hard to follow the thread. I see a lot of people investigated this and concluded, apparently, that nextDrawable doesn't block on these platforms if occluded. From here, we have 2 ways:

  1. on the user side, in cooperation with winit, just don't do any rendering if occluded. @pacmanmati says that it doesn't work as simple. Should there be more iteration on this?
  2. make wgpu force CPU waits for GPU work instead of relying on maxDrawableCount.

@kvark
Copy link
Member

kvark commented Dec 7, 2021

@jbatez the reason this example uses semaphores is not because it wouldn't block otherwise. nextDrawable should still block. But they call it later down the frame, and they need buffers to be available for CPU writes earlier. So semaphores are for these writes only.
However, it's reported in this thread that nextDrawable doesn't block when occluded. Could somebody file an Apple Feedback request about this?

@geertbleyen
Copy link
Contributor

We're having the same issue as what @Matt-hde links from.
bevyengine/bevy#3612 (comment)

@BGR360
Copy link
Contributor

BGR360 commented Feb 20, 2022

Could somebody file an Apple Feedback request about this?

@kvark Could you clarify what you're referring to here?

@kvark
Copy link
Member

kvark commented Feb 24, 2022

macOS has an application called "Feedback Assistant" which one uses to report issues back to Apple.

@botahamec
Copy link
Contributor

botahamec commented Nov 28, 2022

I'm having this same problem on Windows with Vulkan. I get a SurfaceError::Outdated on every frame where the window is minimized, and then the memory usage increased. I don't do anything other than log it when this event occurs. Removing the log doesn't solve the issue for me. The occluded trick doesn't work for me, since it's unsupported on Windows.

@cwfitzgerald
Copy link
Member

That might be a separate issue. You shouldn't be requesting new frames when the window has a size of 0.

@botahamec
Copy link
Contributor

@cwfitzgerald That solved my problem. Thank you

@williamhCode
Copy link

williamhCode commented May 20, 2023

any updates? the issue still persists with v0.16 and the latest winit version (i tried with WindowEvent::Occluded fix).
My biggest question is why Apple did this, shouldn't an app keep running even if it's occluded or out of focus?
Plain old glfw + OpenGL apps on macOS have no problem with this.

@williamhCode
Copy link

williamhCode commented May 25, 2023

I ran a small program with Google's Dawn. They have the same issue...

Edit: after banging my head over this issue, I think it's safe to say it's Apple's problem.

@cwfitzgerald
Copy link
Member

Some further investigation:

  • We leak if the screen is fully hidden
  • We leak if the window is resized.

I suspect these are cases of "the image never actually gets presented" and the drawable is leaked.

image

@cwfitzgerald
Copy link
Member

This doesn't happen when we run on moltenvk, so whatever they're doing we should be doing.

@cwfitzgerald
Copy link
Member

cwfitzgerald commented Nov 22, 2023

Take a look here for how to search for leaks on mac https://github.com/gfx-rs/wgpu/wiki/Debugging-wgpu-Applications#mac-leaks

@scoopr
Copy link
Contributor

scoopr commented Nov 22, 2023

I tested on my intel mac, instrumenting with this:

diff --git a/wgpu-hal/src/metal/mod.rs b/wgpu-hal/src/metal/mod.rs
index 0ddf96ed4..715467fad 100644
--- a/wgpu-hal/src/metal/mod.rs
+++ b/wgpu-hal/src/metal/mod.rs
@@ -35,6 +35,7 @@ use std::{
 use arrayvec::ArrayVec;
 use bitflags::bitflags;
 use metal::foreign_types::ForeignTypeRef as _;
+use objc::{msg_send,sel, sel_impl};
 use parking_lot::{Mutex, RwLock};
 
 #[derive(Clone, Debug)]
@@ -352,6 +353,14 @@ pub struct SurfaceTexture {
     present_with_transaction: bool,
 }
 
+impl Drop for SurfaceTexture {
+    fn drop(&mut self) {
+        let tret :usize = unsafe { msg_send![self.texture.raw, retainCount] };
+        let dret :usize = unsafe { msg_send![self.drawable, retainCount] };
+        eprintln!("drop SurfaceTexture tex.ret={tret} drawable.ret={dret}");
+    }
+}
+
 impl std::borrow::Borrow<Texture> for SurfaceTexture {
     fn borrow(&self) -> &Texture {
         &self.texture

And I see the texture retainCount just rising during normal run, which seems wrong to me?
Perhaps the texture isn't getting dropped like it should? Adding a release in the same Drop seems to keep it at a stable number, but not sure if this is the correct place to drop.

On a quick glance, the release also helped with the memory leak. This didn't fix the odd hang that happens when cmd-tabbing away though, hmm.

Too late to investigate further.

@cwfitzgerald
Copy link
Member

At long last, this tests as fixed by #4781

@Wumpf Wumpf removed their assignment Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests