-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(RHEL-9322) cgroup: drastically simplify caching of cgroups members mask #295
base: rhel-8.10.0
Are you sure you want to change the base?
(RHEL-9322) cgroup: drastically simplify caching of cgroups members mask #295
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't just cherry pick the single commit from the lengthy series in systemd/systemd#10894 and hope for the best. I have a gut feeling this will introduce regressions in our handling of cgroups. At very least we shouldn't be introducing dead code.
src/core/unit.h
Outdated
CGroupMask cgroup_members_mask; | ||
CGroupMask cgroup_realized_mask; /* In which hierarchies does this unit's cgroup exist? (only relevant on cgroupsv1) */ | ||
CGroupMask cgroup_enabled_mask; /* Which controllers are enabled (or more correctly: enabled for the children) for this unit's cgroup? (only relevant on cgroupsv2) */ | ||
CGroupMask cgroup_invalidated_mask; /* A mask specifiying controllers which shall be considered invalidated, and require re-realization */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cgroup_invalidated_mask
isn't used anywhere. Also a comment contains typo in "specifiying".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I attempted to backport the whole series at first, but abandoned that approach in the end. The PR depends on previous PR(s). It would likely take several days to sort it out and backport everything needed.
… controllers being pulled in This shouldn't make much difference in real life, but is a bit cleaner. (cherry picked from commit 442ce77) Related: RHEL-9322
…on a cgroup This changes cg_enable_everywhere() to return which controllers are enabled for the specified cgroup. This information is then used to correctly track the enablement mask currently in effect for a unit. Moreover, when we try to turn off a controller, and this works, then this is indicates that the parent unit might succesfully turn it off now, too as our unit might have kept it busy. So far, when realizing cgroups, i.e. when syncing up the kernel representation of relevant cgroups with our own idea we would strictly work from the root to the leaves. This is generally a good approach, as when controllers are enabled this has to happen in root-to-leaves order. However, when controllers are disabled this has to happen in the opposite order: in leaves-to-root order (this is because controllers can only be enabled in a child if it is already enabled in the parent, and if it shall be disabled in the parent then it has to be disabled in the child first, otherwise it is considered busy when it is attempted to remove it in the parent). To make things complicated when invalidating a unit's cgroup membershup systemd can actually turn off some controllers previously turned on at the very same time as it turns on other controllers previously turned off. In such a case we have to work up leaves-to-root *and* root-to-leaves right after each other. With this patch this is implemented: we still generally operate root-to-leaves, but as soon as we noticed we successfully turned off a controller previously turned on for a cgroup we'll re-enqueue the cgroup realization for all parents of a unit, thus implementing leaves-to-root where necessary. (cherry picked from commit 27adcc9) Related: RHEL-9322
After creating a cgroup we need to initialize its "cgroup.subtree_control" file with the controllers its children want to use. Currently we do so whenever the mkdir() on the cgroup succeeded, i.e. when we know the cgroup is "fresh". Let's update the condition slightly that we also do so when internally we assume a cgroup doesn't exist yet, even if it already does (maybe left-over from a previous run). This shouldn't change anything IRL but make things a bit more robust. (cherry picked from commit 1fd3a10) Related: RHEL-9322
Previously we tried to be smart: when a new unit appeared and it only added controllers to the cgroup mask we'd update the cached members mask in all parents by ORing in the controller flags in their cached values. Unfortunately this was quite broken, as we missed some conditions when this cache had to be reset (for example, when a unit got unloaded), moreover the optimization doesn't work when a controller is removed anyway (as in that case there's no other way for the parent to iterate though all children if any other, remaining child unit still needs it). Hence, let's simplify the logic substantially: instead of updating the cache on the right events (which we didn't get right), let's simply invalidate the cache, and generate it lazily when we encounter it later. This should actually result in better behaviour as we don't have to calculate the new members mask for a whole subtree whever we have the suspicion something changed, but can delay it to the point where we actually need the members mask. This allows us to simplify things quite a bit, which is good, since validating this cache for correctness is hard enough. Fixes: #9512 (cherry picked from commit 5af8805) Resolves: RHEL-9322
This way we can corectly ensure that when a unit that requires some controller goes away, we propagate the removal of it all the way up, so that the controller is turned off in all the parents too. (cherry picked from commit b8b6f32) Related: RHEL-9322
78e2717
to
83ada01
Compare
(cherry picked from commit 43738e0) Related: RHEL-9322
83ada01
to
b719f06
Compare
CI failures looked related but to be sure they are not just flukes @lnykryn force-pushed the feature branch to rerun the checks. |
Unit tests seem to be failing because of #434, But I'm not sure about CentOS CI. |
CentOS CI failure is concerning. @dtardon can you please have a look? |
Oh well, the CentOS CI for this repo is currently FUBAR, since CentOS Stream 8 is no more. I'll need to figure something out to make it working again... |
@mrc0mmand Maybe we could arrange "Redhat developer subscription for teams", i.e. self-service subscription and then we could generate activation key that we would store as GH secret and then use that to enable all needed repos in UBI RHEL container image (e.g. CRB). |
I mean, yeah, that would work for the container stuff (and I plan to actually do that), but this wouldn't work for CentOS CI, since, as the name suggests, it only supports CentOS. So I'll move the C8S job to C9S, which should work just fine. The environment will be different, but it should be enough for making sure we haven't royally screwed something up (we run the same tests internally on actual RHEL 8 anyway). |
CentOS CI should be back (albeit with some compromises, see systemd/systemd-centos-ci@f415a75). I'll look into the container stuff next. |
A quick workaround for the current C8S containers is in #440. |
Previously we tried to be smart: when a new unit appeared and it only
added controllers to the cgroup mask we'd update the cached members mask
in all parents by ORing in the controller flags in their cached values.
Unfortunately this was quite broken, as we missed some conditions when
this cache had to be reset (for example, when a unit got unloaded),
moreover the optimization doesn't work when a controller is removed
anyway (as in that case there's no other way for the parent to iterate
though all children if any other, remaining child unit still needs it).
Hence, let's simplify the logic substantially: instead of updating the
cache on the right events (which we didn't get right), let's simply
invalidate the cache, and generate it lazily when we encounter it later.
This should actually result in better behaviour as we don't have to
calculate the new members mask for a whole subtree whever we have the
suspicion something changed, but can delay it to the point where we
actually need the members mask.
This allows us to simplify things quite a bit, which is good, since
validating this cache for correctness is hard enough.
Fixes: #9512
(cherry picked from commit 5af8805)
Resolves: #2096371