Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential memory leak when re-initializing client #2893

Closed
karliky opened this issue Jan 29, 2025 · 6 comments
Closed

Potential memory leak when re-initializing client #2893

karliky opened this issue Jan 29, 2025 · 6 comments

Comments

@karliky
Copy link

karliky commented Jan 29, 2025

Problem description

Hello, and thanks for all your hard work on this library!

I want to share a situation that might look like a memory leak in @grpc/grpc-js, but I want to stress that this isn't the library’s fault. Instead, it's caused by the way the client is being re-initialized in application code.

I'm creating this issue to help others who might run into similar issues and to suggest potential improvements that could make it easier to detect or prevent.

We noticed an ongoing increase in memory usage when our application handled UNAVAILABLE or DEADLINE_EXCEEDED errors. In those cases, we were re-initializing a new client without closing the existing one. Over time, the detached references to the old clients accumulated in memory, resulting in a crash.

It was difficult to pinpoint the source of the leak because the attached memory references aren’t always obvious in devtools, especially if your monitoring tools (like Datadog) don’t highlight “detached nodes.” As soon as we added client.close() before creating a new client, memory usage stabilized and the issue was resolved.

Reproduction steps

server.js

const grpc = require('@grpc/grpc-js');
const protoLoader = require('@grpc/proto-loader');
const path = require('path');

// Load proto definition
const PROTO_PATH = path.join(__dirname, 'ping.proto');
const packageDefinition = protoLoader.loadSync(PROTO_PATH);
const pingProto = grpc.loadPackageDefinition(packageDefinition).pingpong;

let callCount = 0;

function ping(call, callback) {
  callCount++;
  console.log('Received ping request #', callCount);

  // Every 10th call, return an error with code = UNAVAILABLE
  if (callCount % 10 === 0) {
    const error = {
      code: grpc.status.UNAVAILABLE,
      message: 'Simulated server error: UNAVAILABLE',
    };
    console.log('Sending error for request #', callCount);
    return callback(error, null);
  }

  // For normal requests, just echo back a "pong"
  callback(null, { message: `pong: ${call.request.message}` });
}

function main() {
  const server = new grpc.Server();

  server.addService(pingProto.PingService.service, { Ping: ping });

  const address = '0.0.0.0:50051';
  server.bindAsync(address, grpc.ServerCredentials.createInsecure(), (err, port) => {
    if (err) {
      return console.error(err);
    }
    console.log(`Server running at http://${address}`);
    server.start();
  });
}

main();

client.js

const grpc = require('@grpc/grpc-js');
const protoLoader = require('@grpc/proto-loader');
const path = require('path');

// Load proto definition
const PROTO_PATH = path.join(__dirname, 'ping.proto');
const packageDefinition = protoLoader.loadSync(PROTO_PATH);
const pingProto = grpc.loadPackageDefinition(packageDefinition).pingpong;

// For convenience
const { status } = grpc;

let client;

function initClient() {
  console.log('Initializing client');
  client = new pingProto.PingService(
    'localhost:50051',
    grpc.credentials.createInsecure()
  );
}

function makePingCall(message) {
  return new Promise((resolve, reject) => {
    client.Ping({ message }, (err, response) => {
      if (err) {
        return reject(err);
      }
      resolve(response);
    });
  });
}

async function pingWithRetry(message, retries = 3) {
  for (let attempt = 0; attempt < retries; attempt++) {
    try {
      const response = await makePingCall(message);
      return response;
    } catch (err) {
      const code = err.code;
      console.error(
        `Ping call failed [attempt=${attempt + 1}] with code=${code}, msg="${err.message}"`
      );

      if (
        code === status.UNAVAILABLE ||
        code === status.DEADLINE_EXCEEDED
      ) {
        console.log('Re-initializing client and retrying...');
        initClient();
      } else {
        // Non-retryable error, just throw
        throw err;
      }
    }
  }
  throw new Error('Max retries reached.');
}

async function main() {
  initClient();

  let count = 0;
  // Ping once every second
  setInterval(async () => {
    count++;
    try {
      const response = await pingWithRetry(`Hello #${count}`, 5);
      console.log('Got response:', response.message);
    } catch (err) {
      console.error('Failed after retries:', err.message);
    }
  }, 50);
}


main();

ping.proto

syntax = "proto3";

package pingpong;

service PingService {
  rpc Ping(PingRequest) returns (PingResponse);
}

message PingRequest {
  string message = 1;
}

message PingResponse {
  string message = 1;
}

Now npm install:

$ npm install

Run the server

node server.js

Run the client

node --inspect client.js

Now you should see something like this:
Image

Observe memory usage over time (e.g., using DevTools).

Open devtools, keep taking heapsnapshots over time, you'll see how memory grows continuously:
Image

The memory leak is located at this place in client.js:

function initClient() {
  console.log('Initializing client');
  client = new pingProto.PingService(
    'localhost:50051',
    grpc.credentials.createInsecure()
  );
}

...

if (
        code === status.UNAVAILABLE ||
        code === status.DEADLINE_EXCEEDED
      ) {
        console.log('Re-initializing client and retrying...');
        initClient(); // <--- this causes client reinitialization
      } else {
        // Non-retryable error, just throw
        throw err;
      }

The problem is that initClient creates a brand new client without closing the previous one.

When a previous client is not closed, the application level code stops holding references to that client, which causes it's internal objects to become "detached", mostly because of Nodejs Timers that keep holding references to them. As per what the official documentation says, Objects retained by detached nodes: objects that are kept alive because a detached DOM/object node references them..
Image

These detached nodes stay in memory forever. In our case, they were adding up to 100mb per day until the server crashed.

I know this has been discussed previously here.

The obvious fix for this problem is to close the previous client before initializing a new one:

function initClient() {
  if (client) {
    console.log('Closing previous client');
    client.close();
  }
  console.log('Initializing client');
  client = new pingProto.PingService(
    'localhost:50051',
    grpc.credentials.createInsecure()
  );
}

After closing the client, the memory becomes stable:
Image

Environment

  • OS name, version and architecture: Apple M4 Max
  • Node version: v20.18.0
  • Node installation method: nvm
  • Package name and version: "@grpc/grpc-js": "^1.12.4"

Additional context

Although this is not a bug in @grpc/grpc-js, it could be helpful if the library:

  • Logged a warning when multiple active clients are detected (optionally suppressible).
  • Offered a singleton-like pattern or documented best practices for client lifecycle management.

These measures could prevent unintentional client reinitialization without proper cleanup, which can be hard to track down in large applications.

Our debugging process was complicated by the fact that profiling tools like Datadog don't highlight detached nodes. It took some trial and error to trace the leak to unclosed gRPC client instances. We hope this helps others who might face similar issues.

If there's anything more we can clarify or test, please let us know. Thank you again for maintaining this library and for considering these suggestions.

@murgatroid99
Copy link
Member

The best practice for client lifecycle management is to create a single client object and use it for as long as you need it. You shouldn't need to reinitialize it at all, so the cleanup shouldn't be a concern.

In addition, you can configure grpc-js to do retries instead of doing them yourself in the service config as described here. Note that it will not retry DEADLINE_EXCEEDED errors, because that does not make sense with the intended semantics of deadlines: the deadline should be the time after which the client no longer needs the response.

@karliky
Copy link
Author

karliky commented Jan 30, 2025

Sure, that makes sense. My purpose with this issue is to explore whether there are enhancements or safeguards that can be implemented on the library side to help prevent these kinds of problems. It appears that other users have encountered similar issues, and there's an expectation that the garbage collector should effectively manage memory without manual intervention.

@murgatroid99
Copy link
Member

After thinking about it some more, I realized that unused clients should become garbage-collectible at some point: an unused channel eventually enters the idle state, which includes releasing all of the resources that it owns.

A few things are preventing that from happening in the current code, so I made a change to fix those in #2896 and I intend to release that soon. There are a couple of caveats:

  • This only affects channels that have gone IDLE after you stop using them. By default this takes half an hour, which should work just fine when looking at a timeframe measured in days, but if you want to see this effect in a short-term test, you need to reduce the IDLE timeout by setting the channel option grpc.client_idle_timeout_ms.
  • I cannot entirely eliminate leaks due to channelz. The channelz system tracks all open channels and other objects. Channels are only unregistered from channelz when the close method is called, so the information tracked by channelz will be retained if a channel is garbage collected. However, I did minimize what is retained this way. This can be avoided entirely by setting the channel option grpc.enable_channelz to 0.

@karliky
Copy link
Author

karliky commented Feb 3, 2025

@murgatroid99 Thanks for looking into this and providing a fix. We appreciate the work on #2896 and the added clarity around how idle channels and channelz work.

@murgatroid99
Copy link
Member

The changes in #2896 are now out in version 1.12.6, so you can try it out and see how it impacts your application.

Please note that even with that change, we still recommend using just a single client for all requests you make to a the same service on the same backend.

@karliky
Copy link
Author

karliky commented Feb 5, 2025

Sure, I'll try to test it soon, I'm gonna close this issue and will open a new one if we see something going wrong, thank you so much for taking care of this.

@karliky karliky closed this as completed Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants