TG to kill system processes if taking too much memory #1416

jachro · 2023-03-31T17:45:18Z

We know there are projects (mostly very old) for which renku migrate or renku graph export causes container memory to go infinitely up. As a consequence k8s kills the container and we pick up the same event again and the vicious circle starts.

We could be actively checking the memory of a system process we initiate but before making a change we need to decide:

should we do it keeping in mind we'd cede the process to renku-core (or smth else) at some point?
do we know when this process will be done by some other process?
how we are going to find out the limit knowing that each deployment could have different mem request
is it serious enough (do we have enough projects with the problem) that it's worth doing? If not, we'd need to keep in mind a manual intervention would need to be done each time such a project is processed (rather impossible to do)

I guess if we'd decide to do the fix that we'd need to classify such an error as a GENERATION_NON_RECOVERABLE_FAILURE

The text was updated successfully, but these errors were encountered:

eikek · 2024-05-28T07:32:30Z

This is getting more urgent now. It happened again and with the logging only and no direct access to the production system, it is very hard to impossible to figure out what to do in the short term.

jachro added this to KG's kanban Mar 31, 2023

jachro converted this from a draft issue Mar 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TG to kill system processes if taking too much memory #1416

TG to kill system processes if taking too much memory #1416

jachro commented Mar 31, 2023

eikek commented May 28, 2024 •

edited

Loading

TG to kill system processes if taking too much memory #1416

TG to kill system processes if taking too much memory #1416

Comments

jachro commented Mar 31, 2023

eikek commented May 28, 2024 • edited Loading

eikek commented May 28, 2024 •

edited

Loading