-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential memory leak when re-initializing client #2893
Comments
The best practice for client lifecycle management is to create a single client object and use it for as long as you need it. You shouldn't need to reinitialize it at all, so the cleanup shouldn't be a concern. In addition, you can configure grpc-js to do retries instead of doing them yourself in the service config as described here. Note that it will not retry DEADLINE_EXCEEDED errors, because that does not make sense with the intended semantics of deadlines: the deadline should be the time after which the client no longer needs the response. |
Sure, that makes sense. My purpose with this issue is to explore whether there are enhancements or safeguards that can be implemented on the library side to help prevent these kinds of problems. It appears that other users have encountered similar issues, and there's an expectation that the garbage collector should effectively manage memory without manual intervention. |
After thinking about it some more, I realized that unused clients should become garbage-collectible at some point: an unused channel eventually enters the idle state, which includes releasing all of the resources that it owns. A few things are preventing that from happening in the current code, so I made a change to fix those in #2896 and I intend to release that soon. There are a couple of caveats:
|
@murgatroid99 Thanks for looking into this and providing a fix. We appreciate the work on #2896 and the added clarity around how idle channels and channelz work. |
The changes in #2896 are now out in version 1.12.6, so you can try it out and see how it impacts your application. Please note that even with that change, we still recommend using just a single client for all requests you make to a the same service on the same backend. |
Sure, I'll try to test it soon, I'm gonna close this issue and will open a new one if we see something going wrong, thank you so much for taking care of this. |
Problem description
Hello, and thanks for all your hard work on this library!
I want to share a situation that might look like a memory leak in
@grpc/grpc-js
, but I want to stress that this isn't the library’s fault. Instead, it's caused by the way the client is being re-initialized in application code.I'm creating this issue to help others who might run into similar issues and to suggest potential improvements that could make it easier to detect or prevent.
We noticed an ongoing increase in memory usage when our application handled
UNAVAILABLE
orDEADLINE_EXCEEDED
errors. In those cases, we were re-initializing a new client without closing the existing one. Over time, the detached references to the old clients accumulated in memory, resulting in a crash.It was difficult to pinpoint the source of the leak because the attached memory references aren’t always obvious in devtools, especially if your monitoring tools (like Datadog) don’t highlight “detached nodes.” As soon as we added
client.close()
before creating a new client, memory usage stabilized and the issue was resolved.Reproduction steps
server.js
client.js
ping.proto
Now npm install:
Run the server
Run the client
Now you should see something like this:
![Image](https://private-user-images.githubusercontent.com/881069/407660303-7a44dc77-510b-4caf-8848-5a85eaf5c66d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzMTg0NjIsIm5iZiI6MTczOTMxODE2MiwicGF0aCI6Ii84ODEwNjkvNDA3NjYwMzAzLTdhNDRkYzc3LTUxMGItNGNhZi04ODQ4LTVhODVlYWY1YzY2ZC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjExJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxMVQyMzU2MDJaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1iY2RkM2Y1MzI2YmE3OWFjMmViNTAyMzYyMTljY2QyZjY3MDg5Mjk1NWQ5N2M1ZGY4MzUzYTVmNTRmZWNmYjYyJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.M8rU228u1DEdnMERtWtglmS8HZIz1dSClVNoU3j4wWY)
Observe memory usage over time (e.g., using DevTools).
Open devtools, keep taking heapsnapshots over time, you'll see how memory grows continuously:
![Image](https://private-user-images.githubusercontent.com/881069/407660375-e5d537e9-ee81-4af6-ac7f-dbc8100ab4a2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzMTg0NjIsIm5iZiI6MTczOTMxODE2MiwicGF0aCI6Ii84ODEwNjkvNDA3NjYwMzc1LWU1ZDUzN2U5LWVlODEtNGFmNi1hYzdmLWRiYzgxMDBhYjRhMi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjExJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxMVQyMzU2MDJaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1jOTAxYTc0ODUzMTdlZjMyZWFhZTFmMGFiNWJmOGEwZWU3YWUwNGUxMGMwNGU4Y2JmYmNkMWQxNTBiMWFmNGMyJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.Lv-UmAXUgdXirFu6zZyyQnPGKbJGDU0y0KFHWu4_vmk)
The memory leak is located at this place in client.js:
The problem is that
initClient
creates a brand new client without closing the previous one.When a previous client is not closed, the application level code stops holding references to that client, which causes it's internal objects to become "detached", mostly because of Nodejs Timers that keep holding references to them. As per what the official documentation says, Objects retained by detached nodes: objects that are kept alive because a detached DOM/object node references them..
![Image](https://private-user-images.githubusercontent.com/881069/407662775-8b226504-5bad-4b04-ad11-483729870fde.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzMTg0NjIsIm5iZiI6MTczOTMxODE2MiwicGF0aCI6Ii84ODEwNjkvNDA3NjYyNzc1LThiMjI2NTA0LTViYWQtNGIwNC1hZDExLTQ4MzcyOTg3MGZkZS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjExJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxMVQyMzU2MDJaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0zMDZiODVlOWQ0ODdiOTgwMjIxZDliNWQwZWY2N2M3OTdkNTE2MzE2OGYxM2IzYmY2ZDlhYmRiZTRiYzUyOWE5JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.tPtpChRxkh1JE0ot8T7rgd1g0KhIMT1vZ-VlJ_fc_C8)
These detached nodes stay in memory forever. In our case, they were adding up to 100mb per day until the server crashed.
I know this has been discussed previously here.
The obvious fix for this problem is to close the previous client before initializing a new one:
After closing the client, the memory becomes stable:
![Image](https://private-user-images.githubusercontent.com/881069/407662697-000e8aad-a67e-43d5-9b01-350e9d721f25.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzMTg0NjIsIm5iZiI6MTczOTMxODE2MiwicGF0aCI6Ii84ODEwNjkvNDA3NjYyNjk3LTAwMGU4YWFkLWE2N2UtNDNkNS05YjAxLTM1MGU5ZDcyMWYyNS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjExJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxMVQyMzU2MDJaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0zZDkzYjBjZjU5MmZjMWY2MTM5ODg3YTAwMTVjZTVlZjFhZjQ0MTIzNGU0YzhiYjc3YTY2M2EwMTZkZDhmYTU2JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.WXuam1eQIn4Xp06gYOKr5nL3fr50p_LAI3RgAKcyZCg)
Environment
Additional context
Although this is not a bug in
@grpc/grpc-js
, it could be helpful if the library:These measures could prevent unintentional client reinitialization without proper cleanup, which can be hard to track down in large applications.
Our debugging process was complicated by the fact that profiling tools like Datadog don't highlight detached nodes. It took some trial and error to trace the leak to unclosed gRPC client instances. We hope this helps others who might face similar issues.
If there's anything more we can clarify or test, please let us know. Thank you again for maintaining this library and for considering these suggestions.
The text was updated successfully, but these errors were encountered: