Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing metrics in cgroup v2 #3062

Open
cyrus-mc opened this issue Feb 15, 2022 · 9 comments
Open

Missing metrics in cgroup v2 #3062

cyrus-mc opened this issue Feb 15, 2022 · 9 comments

Comments

@cyrus-mc
Copy link

This might be related to #3026 (which I am not sure has been released yet).

On nodes running cgroup v1 the following metrics such as container_cpu_cfs_throttled_* are returned. If cadvisor is run on nodes with cgroup v2 enabled those metrics are not returned.

There could be others but these are the ones I noted when attempting to troubleshoot an issue. To verify I ran the latest of the 0.39.x release and 0.43.x and both exhibited the same behavior.

@cyrus-mc
Copy link
Author

@ysksuzuki pinging you here to see if this is related to #3026 that was recently closed, and I think the fix is awaiting release.

@chrstphfrtz
Copy link

Is there any update on this? We have the same problem running cadvisor v0.46.0 on our cluster. Changing back to cgroup v1 restores also other metrics like container_memory_max_usage_bytes.

@sli720
Copy link

sli720 commented Dec 28, 2022

Is there any update? We are also missing these metrics when using cgroup v2.

@mindw
Copy link

mindw commented Jul 29, 2023

Spent some time looking into the code, it seems the underlying crun library doesn't populate MemoryStats.Usage.MaxUsage when Cgroups V2 is used. So the code at

ret.Memory.Usage = s.MemoryStats.Usage.Usage
is insufficient.
A possible solution would be to place ret.Memory.MaxUsage = s.MemoryStats.Stats["peak"] in the if cgroups.IsCgroup2UnifiedMode() block. Don't have a V2 system handy right now to test with :(

@mindw
Copy link

mindw commented Aug 12, 2023

Went into the rabbit hole a bit further. It seems peak is only available since kernel 5.19 (specifically commit torvalds/linux@8e20d4b).
So, next would be to test on 6.1 kernel.

@haircommander
Copy link
Contributor

I'm beginning to fix this in opencontainers/runc#4038

@micpjwi
Copy link

micpjwi commented Apr 29, 2024

I'm beginning to fix this in opencontainers/runc#4038

I think this fix made it's way into runc 1.11.0. Since then, I've seen that COS-113 mentions runc 1.12.0. And so does cAdvisor 0.49.0.

But I guess the fix suggested here would still need to be implemented before the metric would be exposed by cAdvisor. Would anyone know the status of this?

@msannikov
Copy link

This seems to be fixed. I've seen both container_cpu_cfs_throttled_* and container_memory_max_usage_bytes being present and not zero when run on a node with kubelet v1.30 (has cadvisor 0.49.0) when cgroup v2 is used.

@wallrj
Copy link

wallrj commented Jul 17, 2024

Went into the rabbit hole a bit further. It seems peak is only available since kernel 5.19 (specifically commit torvalds/linux@8e20d4b).
So, next would be to test on 6.1 kernel.

@mindw Thanks for digging. I stumbled across your comments while trying to measure the peak memory use of cert-manager components.
I was testing on Kind on a Windows WSL2 virtual machine and observed container_memory_max_usage_bytes having only zero values.

The default WSL2 kernel is v5.15 but happily a new WSL2 v6 kernel is soon to be released so I'll report back when that is available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants