Support for earlyoom for containerized environments. #331

amannijhawan · 2024-11-21T23:50:17Z

Currently, earlyoom calculates memory availability and limits by reading /proc/meminfo, which works well in non-containerized environments or directly on the host. However, in Kubernetes pods, memory statistics such as memavail and mem max should be derived from the cgroup limits rather than the host's /proc/meminfo. This mismatch causes earlyoom to incorrectly calculate memory usage when running inside a Kubernetes pod.

For our use case, we aim to prevent Kubernetes from killing the entire pod when a single errant process exceeds its memory limit. While the ideal solution would be to isolate the process in a separate container, this is not feasible for us due to architectural constraints. Instead, we would like to use earlyoom within the pod to catch the errant process and terminate it before the pod breaches its memory limit.

We have successfully deployed earlyoom inside a pod with the proc filesystem mounted, and it runs as expected in terms of functionality. However, since earlyoom reads node-level memory stats from /proc/meminfo, it does not honor the pod's cgroup memory limits.

[root@almalinux8 /]# rpm -ivh https://kojipkgs.fedoraproject.org//packages/earlyoom/1.6.2/1.el8/x86_64/earlyoom-1.6.2-1.el8.x86_64.rpm
Retrieving https://kojipkgs.fedoraproject.org//packages/earlyoom/1.6.2/1.el8/x86_64/earlyoom-1.6.2-1.el8.x86_64.rpm
Verifying...                          ################################# [100%]
Preparing...                          ################################# [100%]
Updating / installing...
   1:earlyoom-1.6.2-1.el8             ################################# [100%]
[root@almalinux8 /]# earlyoom 
earlyoom 1.6.2
mem total: 60278 MiB, swap total:    0 MiB
sending SIGTERM when mem <= 10.00% and swap <= 10.00%,
        SIGKILL when mem <=  5.00% and swap <=  5.00%
mem avail: 48602 of 60278 MiB (80.63%), swap free:    0 of    0 MiB ( 0.00%)
mem avail: 48592 of 60278 MiB (80.61%), swap free:    0 of    0 MiB ( 0.00%)
mem avail: 48591 of 60278 MiB (80.61%), swap free:    0 of    0 MiB ( 0.00%)
mem avail: 48589 of 60278 MiB (80.61%), swap free:    0 of    0 MiB ( 0.00%)
mem avail: 48590 of 60278 MiB (80.61%), swap free:    0 of    0 MiB ( 0.00%)
mem avail: 48595 of 60278 MiB (80.62%), swap free:    0 of    0 MiB ( 0.00%)
mem avail: 48586 of 60278 MiB (80.60%), swap free:    0 of    0 MiB ( 0.00%)
^C
[root@almalinux8 /]# earlyoom ^C
[root@almalinux8 /]#

Proposed Solution:

Update parse_meminfo: Modify parse_meminfo to support reading memory statistics from the cgroup filesystem (/sys/fs/cgroup) when earlyoom detects that it is running inside a container.
For cgroup v1, we can read memory limits from memory.limit_in_bytes and current usage from memory.usage_in_bytes.
For cgroup v2, we can use memory.max for the limit and memory.current for current usage.
Fallback to /proc/meminfo: If the cgroup paths are not available, retain the current behavior of reading from /proc/meminfo.

Steps to Reproduce:
Deploy a pod in Kubernetes with earlyoom running inside.

[centos@dev-server-anijhawan-4 ~]$ cat pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: almalinux8
spec:
  containers:
  - name: almalinux
    image: almalinux:8
    command: ["sleep", "3600"] # Keeps the pod running for 1 hour
    
[centos@dev-server-anijhawan-4 ~]$ kubectl apply -f pod.yaml 

[centos@dev-server-anijhawan-4 ~]$ kubectl exec -it almalinux8 /bin/bash

[root@almalinux8 /]# rpm -ivh https://kojipkgs.fedoraproject.org//packages/earlyoom/1.6.2/1.el8/x86_64/earlyoom-1.6.2-1.el8.x86_64.rpm
Retrieving https://kojipkgs.fedoraproject.org//packages/earlyoom/1.6.2/1.el8/x86_64/earlyoom-1.6.2-1.el8.x86_64.rpm
Verifying...                          ################################# [100%]
Preparing...                          ################################# [100%]
Updating / installing...
   1:earlyoom-1.6.2-1.el8             ################################# [100%]
[root@almalinux8 /]# earlyoom 
earlyoom 1.6.2
mem total: 60278 MiB, swap total:    0 MiB
sending SIGTERM when mem <= 10.00% and swap <= 10.00%,
        SIGKILL when mem <=  5.00% and swap <=  5.00%
mem avail: 48602 of 60278 MiB (80.63%), swap free:    0 of    0 MiB ( 0.00%)
mem avail: 48592 of 60278 MiB (80.61%), swap free:    0 of    0 MiB ( 0.00%)
mem avail: 48591 of 60278 MiB (80.61%), swap free:    0 of    0 MiB ( 0.00%)

I would be willing to contribute the said fix and upstream it as well.

The text was updated successfully, but these errors were encountered:

amannijhawan · 2024-12-05T20:09:09Z

bumping this up to see if the maintainers think this would be useful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for earlyoom for containerized environments. #331

Support for earlyoom for containerized environments. #331

amannijhawan commented Nov 21, 2024

amannijhawan commented Dec 5, 2024

Support for earlyoom for containerized environments. #331

Support for earlyoom for containerized environments. #331

Comments

amannijhawan commented Nov 21, 2024

amannijhawan commented Dec 5, 2024