Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG to kill system processes if taking too much memory #1416

Open
4 tasks
jachro opened this issue Mar 31, 2023 · 1 comment
Open
4 tasks

TG to kill system processes if taking too much memory #1416

jachro opened this issue Mar 31, 2023 · 1 comment

Comments

@jachro
Copy link
Contributor

jachro commented Mar 31, 2023

We know there are projects (mostly very old) for which renku migrate or renku graph export causes container memory to go infinitely up. As a consequence k8s kills the container and we pick up the same event again and the vicious circle starts.

We could be actively checking the memory of a system process we initiate but before making a change we need to decide:

  • should we do it keeping in mind we'd cede the process to renku-core (or smth else) at some point?
  • do we know when this process will be done by some other process?
  • how we are going to find out the limit knowing that each deployment could have different mem request
  • is it serious enough (do we have enough projects with the problem) that it's worth doing? If not, we'd need to keep in mind a manual intervention would need to be done each time such a project is processed (rather impossible to do)

I guess if we'd decide to do the fix that we'd need to classify such an error as a GENERATION_NON_RECOVERABLE_FAILURE

@jachro jachro converted this from a draft issue Mar 31, 2023
@eikek
Copy link
Member

eikek commented May 28, 2024

This is getting more urgent now. It happened again and with the logging only and no direct access to the production system, it is very hard to impossible to figure out what to do in the short term.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready
Development

No branches or pull requests

2 participants