Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GOMEMLIMIT to reduce the likelihood of an OOM issue #2124

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

joshuatcasey
Copy link
Member

Use GOMEMLIMIT to reduce the likelihood of an OOM issue

Fixes #1103

Copy link

netlify bot commented Nov 15, 2024

Deploy Preview for pinniped-dev canceled.

Name Link
🔨 Latest commit eaeda83
🔍 Latest deploy log https://app.netlify.com/sites/pinniped-dev/deploys/6737b8b49d28910008847787

Copy link

codecov bot commented Nov 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 31.28%. Comparing base (e86f3cc) to head (eaeda83).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2124   +/-   ##
=======================================
  Coverage   31.28%   31.28%           
=======================================
  Files         371      371           
  Lines       61131    61131           
=======================================
+ Hits        19124    19127    +3     
+ Misses      41482    41480    -2     
+ Partials      525      524    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cfryanr
Copy link
Member

cfryanr commented Nov 26, 2024

Some interesting advice from the Go maintainers:

  • "A good rule of thumb is to leave an additional 5-10% of headroom to account for memory sources the Go runtime is unaware of". This PR doesn't do that currently. I don't think there's a way to do math with the downward API in yaml. It looks like it has been suggested but doesn't exist yet.
  • "Don't set a memory limit to avoid out-of-memory conditions when a program is already close to its environment's memory limits. This effectively replaces an out-of-memory risk with a risk of severe application slowdown." I don't think we've profiled memory usage of a running Concierge or Supervisor in a long time. It's probably worth reviewing that and maybe also updating the default memory requests and limits based on actual observed memory usage. They are currently set to 128Mi which is nice and small, but is it too small in terms of this advice? This would be good to review even outside the context of this PR, because if it turned out that the limit were currently set too low then it could even be a problem causing unnecessary pod restarts in the field today.

@joshuatcasey
Copy link
Member Author

Good findings! It's possible for an application to set this value (and retrieve this value) using https://pkg.go.dev/runtime/debug#SetMemoryLimit. I wonder if we could add a log statement to retrieve the value.

@joshuatcasey
Copy link
Member Author

I've been thinking about this, and we could likely make this about as complex as we want to make it. We may not need to take any action, it's pretty rare that we've ever seen Supervisor or Concierge OOMKilled at all, and in the few cases I can remember that has been because there were a huge number of Secrets in the same namespace.

Perhaps the immediate way to get benefit here is that we could accommodate some memory headroom in the Pinniped deployments by setting GOMEMLIMIT=128Mi and bumping resources.requests|limits.memory=150Mi. The Pinniped deployments (in the deploy/* directories) do not have configurable values for resources.requests|limits.memory, so presumably we could keep them as fixed values without much impact to actual memory consumption.

For other deployment mechanisms, we have no control over what they set for resources.requests|limits.memory or env variable GOMEMLIMIT and I'm not sure we should do much in code to try to account for this either.

Quick note about units: golang uses powers-of-two units to express a number of bytes, and k8s supports both powers-of-ten and power-of-two units. GOMEMLIMIT=128MiB and resources.requests|limits.memory=128Mi are the equivalent quantity of bytes. It's possible that these units may not be compatible. I'm looking into that now.

@cfryanr
Copy link
Member

cfryanr commented Dec 2, 2024

and bumping resources.requests|limits.memory=150Mi

I worry that increasing the request value may cause issues during upgrade on clusters are that running workloads with requests that are very close to the node and/or cluster's total available value. It may impact how pods are scheduled in an undesirable way. It might be good to profile the actual memory usage to see if an increase is needed, or maybe just avoid an increase. If 128Mi is already sufficient for most use cases, then one option might be to set the GOMEMLIMIT to be ~5% less than that and profile the actual performance (memory usage, gc cycles, etc.) before merging to make sure it does not have an obvious negative impact in basic use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use go 1.19's soft memory limit when it is released
3 participants