Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet drop in #1299

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Kubelet drop in #1299

wants to merge 4 commits into from

Conversation

victortoso
Copy link
Member

What this PR does / why we need it:

We are using deprecated kubelet's command line options which have config alternative options. Seems to be known since b89e9a6. This, combined with the goal of enabling k8s feature gates such as DRA in #1251, the goal of this PR is to improve kubelet's config options and use drop-in feature.

Commit logs clarify further the changes.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

This PR only focused on k8s-1.30 to validate the change. I can do here or in a new PR for k8s-1.31 too.
/cc @lyarwood

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note:

NONE

@kubevirt-bot kubevirt-bot requested a review from lyarwood October 11, 2024 11:58
@kubevirt-bot kubevirt-bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/S labels Oct 11, 2024
@kubevirt-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubevirt-bot kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 9, 2025
@victortoso
Copy link
Member Author

/remove-lifecycle stale
🏓 @lyarwood
@iholder101 If you have the time too? :)

@kubevirt-bot kubevirt-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 14, 2025
Copy link
Contributor

@iholder101 iholder101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome stuff, looks great to me! Thank you @victortoso!

/lgtm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right file for this logic? Should we place it somewhere more generic that would fit other k8s versions as well? For example 1.31?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move it to 1.31 it'll be cloned when 1.32 is created? Not sure how this process goes, I'm happy to change it to the right place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll ping dhiller/brian in slack and see where I should put this change to make it cloned when a new k8s release is included in kubevirtci and propose it in a follow up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or I can do in this one too, either way is fine for me :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the k8s-1.32 is available now so if we want it carried on through future providers we need this change added to k8s-1.32 & k8s-1.31. The next provider (k8s-1.33) will be based off of a copy of k8s-1.32.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Brian! I'll add & test it to k8s-1.31 and k8s-1.32 later Today.

Comment on lines +151 to +169
# Create drop-in config files for kubelet
# https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/#kubelet-conf-d
kubelet_conf_d="/etc/kubernetes/kubelet.conf.d"
mkdir -m 644 $kubelet_conf_d

# Set our custom initializations to kubelet
kubevirt_kubelet_conf="$kubelet_conf_d/50-kubevirt.conf"
cat <<EOF >$kubevirt_kubelet_conf
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
failSwapOn: false
kubeletCgroups: /systemd/system.slice
EOF

# Set only command line options not supported by config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!

Do you think it's possible to somehow let the user provide an alternative KubeletConfiguration that will override this? I think it would give the user full control which could be very beneficial when doing complex configuration. In any case it should not block this PR, just an idea :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's a possibility. My initial motivation is actually make it easier to do changes without abusing sed :)

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 14, 2025
@@ -148,9 +148,24 @@ patch $cni_manifest_ipv6 $cni_ipv6_diff

cp /tmp/local-volume.yaml /provision/local-volume.yaml

# TODO use config file! this is deprecated
# Create drop-in config files for kubelet
# https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/#kubelet-conf-d
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see in that link that KUBELET_CONFIG_DROPIN_DIR_ALPHA has to be defined, but I don't see it being defined here. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaiu, that's for k8s 1.28 and 1.29, that is, you have to use --config-dir plus set the env var. The line in full:

For Kubernetes v1.28 to v1.29, you can only specify --config-dir if you also set the environment variable KUBELET_CONFIG_DROPIN_DIR_ALPHA for the kubelet process (the value of that variable does not matter).

Copy link
Member

@lyarwood lyarwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@victortoso
Copy link
Member Author

/retest-required

We can now use a proper API for customizations. See follow-up commits.

Signed-off-by: Victor Toso <[email protected]>
I've verified that in k8s-1.30, removing all options set in
KUBELET_EXTRA_ARS, would result in the following diff:

    @@ -54 +53,0 @@
    -    "kubeletCgroups": "/systemd/system.slice",
    @@ -87 +86 @@
    -    "failSwapOn": false,
    +    "failSwapOn": true,

I've moved both kubeletCgroups and failSwapOn into kubelet's drop-in
config file and verified that it works and config is set.

I assume that --cgroup-driver=systemd is already the default in centos
stream or being set somewhere else, but to avoid regressions I've
included in the config file as well.

I don't know why --runtime-cgroups is not being shown in the node's
configz but it is the only command line option for the initial 4 that
is not marked as DEPRECATED in favor of Config file alternative, so
let's keep it as command line option, together with --config-dir.

Signed-off-by: Victor Toso <[email protected]>
Apply similar changes done to k8s-1.30, see commits:
 * "k8s-1.30: enable kubelet drop-in config files"
 * "k8s-1.30: Use kubelet's drop-in config files"

Signed-off-by: Victor Toso <[email protected]>
Apply similar changes done to k8s-1.30, see commits:
 * "k8s-1.30: enable kubelet drop-in config files"
 * "k8s-1.30: Use kubelet's drop-in config files"

With this, future k8s will have this enabled too.

Signed-off-by: Victor Toso <[email protected]>
@kubevirt-bot kubevirt-bot added size/M and removed lgtm Indicates that a PR is ready to be merged. size/S labels Jan 24, 2025
@victortoso
Copy link
Member Author

I've added it to k8s-1.31 and k8s-1.32 too. I'm still testing locally but I've updated here to get CI check too in the meanwhile.

@victortoso
Copy link
Member Author

/retest

@brianmcarey
Copy link
Member

/test check-provision-k8s-1.32

@victortoso
Copy link
Member Author

On check-provision-k8s-1.32

"PullImage from image service failed" err="rpc error: code = Canceled desc = copying system image from manifest list: copying config: context canceled" image="quay.io/ceph/ceph:v17"
"PullImage from image service failed" err="rpc error: code = Canceled desc = copying system image from manifest list: copying config: context canceled" image="quay.io/cephcsi/cephcsi:v3.7.0"
"PullImage from image service failed" err="rpc error: code = Canceled desc = copying system image from manifest list: copying config: context canceled" image="quay.io/openshift/origin-kube-rbac-proxy@sha256:e2def4213ec0657e72eb790ae8a115511d5b8f164a62d3568d2f1bff189917e8"

Seem unrelated, but

[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
W0127 08:59:25.908183    8789 cleanupnode.go:105] [reset] Failed to remove containers: failed to stop running pod a699510b2138d036736797290145d120e7914e43d1905416f55df74e659776d4: rpc error: code = DeadlineExceeded desc = context deadline exceeded
[reset] Deleting contents of directories: [/etc/kubernetes/manifests /var/lib/kubelet /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/super-admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

Picked my interested... Pod a699510b213, seems like the error is related to cleanup of netns...

"MESSAGE":"time="2025-01-27 09:05:34.696875517Z" level=warning msg="Could not restore sandbox a699510b2138d036736797290145d120e7914e43d1905416f55df74e659776d4: failed to Statfs \"/var/run/netns/25937231-3dd4-4381-bd31-ff5e0ec1a4e8\": no such file or directory""

But after it shows

level=info msg="Successfully cleaned up network for pod a699510b2138d036736797290145d120e7914e43d1905416f55df74e659776d4

(Artifacts > 202501270936_sonobuoy_47c90549-e721-4be5-b88d-70bbea36af4)

So, not sure if issue is related.

@victortoso
Copy link
Member Author

Okay, k8s-1.32 looks good. Thanks @brianmcarey
/cc @iholder101 and @lyarwood, after this is merged it should unblock an improvement of #1345

Cheers

Copy link
Member

@brianmcarey brianmcarey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: brianmcarey

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 30, 2025
@iholder101
Copy link
Contributor

/retest

@victortoso
Copy link
Member Author

Misses only the lgtm, @lyarwood, @iholder101 if you have the time

@iholder101
Copy link
Contributor

Thank you @victortoso! Great job!
/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/enhancement lgtm Indicates that a PR is ready to be merged. size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants