cgroup-v2 considerations #109

wenningerk · 2020-02-27T18:26:04Z

prevent possible lockup when format in proc changes
properly get and handle scheduler policy & prio
recognize and try to handle cgroup-v2 similarly
on SCHED_RR failing push to the max with SCHED_OTHER

Just as a preview ...
Needs splitting probably.
And the cgroup-v2 stuff is ugly:

scanning /proc/sched_debug seems to be the only easy way to find
out about CONFIG_RT_GROUP_SCHED being enabled with cgroup-v2
currently (as of 5.4.20) there is no hierarchical rt-budget and
so moving to the root-slice in all cases with all consequences
when moving to the root-slice journal stops working
auto and yes for SBD_MOVE_TO_ROOT_CGROUP are behaving the same

kgaillot · 2020-02-27T23:15:46Z

Code-wise it looks reasonable, though I'm not familiar with either cgroup implementation and didn't do any testing. Spelling: "budged" in a couple of places.

It's probably worthwhile to comment, either in the sysconfig file or the code, the conditions under which cgroup v2 will be effective. I.e. what kernel version made it available and what has to be done to switch to it, and how a user could tell what an existing system uses.

wenningerk · 2020-02-28T07:20:55Z

It's probably worthwhile to comment, either in the sysconfig file or the code, the conditions under which cgroup v2 will be effective. I.e. what kernel version made it available and what has to be done to switch to it, and how a user could tell what an existing system uses.

Tried to be a bit more descriptive in the comment before the code that is actually doing the check.
As it is there for a while in the kernel and both can be configured I guess going into kernel-versions that would provide some version of cgroup-v2 doesn't make much sense.
Fedora 31 seems to be the first distribution using cgroup-v2 by default and although it should be possible I didn't play with switching back and forth. Asking for trouble probably. Effort here is more to live with it if it is there.
Even with cgroup-v2 enabled in as in Fedora 31 up to now approaches shouldn't run into issues as long as CONFIG_RT_GROUP_SCHED isn't enabled as moving to root-slice is not needed.
Both sbd and corosync will first check for non existent /sys/fs/cgroup/cpu/cpu.rt_runtime_us and be happy.
To play with, an otherwise Fedora 31 kernel with CONFIG_RT_GROUP_SCHED enabled can be found under https://koji.fedoraproject.org/koji/taskinfo?taskID=41654832 (don't know when it would be cleaned up).

jfriesse · 2020-02-28T07:36:10Z

Looks reasonable (a bit scary tho) but I have a question. What you mean by "when moving to the root-slice journal stops working"? It's logging to journald or some other journal (sbd, fs, ...)?

wenningerk · 2020-02-28T07:42:31Z

Looks reasonable (a bit scary tho) but I have a question. What you mean by "when moving to the root-slice journal stops working"? It's logging to journald or some sbd journal?

logging stops to work unfortunately. If it was something sbd internal I would have tried to make it work ;-)
no idea if it is just that (bad enough but we would have logging in a file as well) or if there are other issues. Anyway stopping via the cgroup is probably not working with all that root-slice switching - which is why I try to prevent it whenever possible.

jfriesse · 2020-02-28T07:50:35Z

Looks reasonable (a bit scary tho) but I have a question. What you mean by "when moving to the root-slice journal stops working"? It's logging to journald or some sbd journal?

logging stops to work unfortunately. If it was something sbd internal I would have tried to make it work ;-)
no idea if it is just that (bad enough but we would have logging in a file as well) or if there are other issues. Anyway stopping via the cgroup is probably not working with all that root-slice switching - which is why I try to prevent it whenever possible.

Ok, thanks for the info.

wenningerk · 2020-03-02T13:55:19Z

cherry-picked the travis-config changes needed for mock 2.0 (update in fedora-31) as they are not really related to the topic of this PR.
Split off the scheduler-config stuff that isn't actually cgroup-v2 related.
Guess it should be OK to cherry-pick that into master as well as it should fix a possible hang-situation when /proc-content changes with some kernel-version & it makes behavior more similar with what corosync is doing (fall back to raising prio to the max within SCHED_OTHER if switch to SCHED_RR is failing).

wenningerk force-pushed the cgroup2 branch from 5b23fdc to 9441ce0 Compare February 28, 2020 06:56

wenningerk force-pushed the cgroup2 branch 2 times, most recently from 2ec37a2 to 763e3a0 Compare March 2, 2020 13:46

wenningerk changed the title ~~Fix: scheduling: overhaul the whole thing~~ cgroup-v2 considerations Mar 6, 2020

Fix: scheduling: recognize and try to handle cgroup-v2 similarly

d371363

wenningerk force-pushed the cgroup2 branch from 763e3a0 to d371363 Compare June 16, 2023 05:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cgroup-v2 considerations #109

cgroup-v2 considerations #109

wenningerk commented Feb 27, 2020

kgaillot commented Feb 27, 2020

wenningerk commented Feb 28, 2020 •

edited

Loading

jfriesse commented Feb 28, 2020 •

edited

Loading

wenningerk commented Feb 28, 2020

jfriesse commented Feb 28, 2020

wenningerk commented Mar 2, 2020

cgroup-v2 considerations #109

Are you sure you want to change the base?

cgroup-v2 considerations #109

Conversation

wenningerk commented Feb 27, 2020

kgaillot commented Feb 27, 2020

wenningerk commented Feb 28, 2020 • edited Loading

jfriesse commented Feb 28, 2020 • edited Loading

wenningerk commented Feb 28, 2020

jfriesse commented Feb 28, 2020

wenningerk commented Mar 2, 2020

wenningerk commented Feb 28, 2020 •

edited

Loading

jfriesse commented Feb 28, 2020 •

edited

Loading