node-restart takes over 2 minutes to shutdown due to longhorn #9752

EugenMayer · 2024-03-17T18:21:44Z

EugenMayer
Mar 17, 2024

Environmental Info:
K3s Version: 1.28.7

Node(s) CPU architecture, OS, and Version:
ubuntu jammy, amd64

Cluster Configuration:

single node
longhorn
only a few workloads (home lab)

Describe the bug:
Usual reboot of a node takes very long due to probably longhorn, not sure
Running https://docs.k3s.io/upgrades/killall makes it visiable that unmounting longhorn volumes seems to be the cause.

it stops at

+ do_unmount_and_remove /var/lib/kubelet/plugins
+ set +x
sh -c 'umount -f "$0" && rm -rf "$0"' /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.local-hostpath/e33f8f23d8b198b4ceaeb402828ec0eb407f837ad649800eb7120d7e6f03dfe5/globalmount
sh -c 'umount -f "$0" && rm -rf "$0"' /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.local-hostpath/14f964bbeab8cb015ce406948e1e3986eb90e6dcc7dba373e8827eaac51c6da6/globalmount
sh -c 'umount -f "$0" && rm -rf "$0"' /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/fd8f797d6b187f87af3e00a5a217407db9c89402485b1f24c10147d8c4cf534f/globalmount

The entire process takes 2 minutes and 5 seconds. Always 2 minutes and 5-10s, this is the classic 120s timeout that is waiting to happen somewhere.

Is there anything to debug with to see what it might be? IMHO it seems to be blocked on unounting the first longhorn volume, then waits and then instantly unmounts about 10 longhorn volumes in a row after 2 minutes

brandond · 2024-03-18T02:48:04Z

brandond
Mar 18, 2024
Collaborator

This isn't strictly a question about K3s; I believe you'll see the same behavior on any node that you shut down while leaving volumes mounted. K3s doesn't stop pods when the service stops, to allow for nondisruptive upgrades. If you wanted to address this delay, you could probably drain the node first, and/or run the killall script before shutdown.

4 replies

EugenMayer Mar 18, 2024
Author

Thank you for the quick reply!

I understand the intention, but i assume "intention" is just the right word here to talk about.

We have to assume that k3s is used in single-node cluster scenarios a lot, right? This is one of it's strengths, one of its USPs and thus at least we have to put some weight there. Same for me here, for multi node, i usually use rke2, for single node or maybe small 2 node clusters, i use k3s. So "draining" in that sense is usually nothing that one will use to transfer the workload to the other nodes in the cluster. For k3s scenarios, shutting down the node will be (more often) just stopping all workloads, reboot, done. This is the reason we need a at least faster process to do so. The "downtime" is stretched too much here, IMHO.

Upgrades
That said, i think we need to be able to express the intention of "just an upgrade of k3s" vs "the need of a restart of the node itself". I understand that the systemd service k3s and stopping it is reserved for uninterrupted upgrades. This is great and the feature alone is even more important then node-restarts, since it will be used more often.

Node restarts
So lets then introduce something else, k3s-node-reboot, and even if this is a systemd service that just is an empty shell, a guard, that is there to properly shut down the node by:

draining
stopping k3s
running killall
whatever is needed

It should be setup so it is run before k3s is stopped by systemd during the usual shutdown process, so we can have better control here.

I have read several, dozens topics about shutting k3s and IMHO we should take it a little more serious, as described above. k3s will be used in single node setups and restarting the node ind 3-4 minutes for kernel security fixes is just not practical IMHO.

brandond Mar 18, 2024
Collaborator

Something like this? #4319 (comment)

brandond Mar 18, 2024
Collaborator

I will also note that LH doesn't make a lot of sense to me on single-node setups, as you don't have any of the benefits of data replication. I guess the snapshot stuff is still useful. For most single-node clusters without external storage, I think people usually just use the local-path provisioner.

EugenMayer Mar 18, 2024
Author

I guess yes, i tried step 1/2 before opening this discussion, it does not help for me allow. I did not try the optional step, i will so today evening.

Generally, i would suggest that we rather build this into the k3s distro generally. Let it be a systemd service that is disabled by default, if there are concerns that this should not be the default beh. if not known (i would suggest that for k3s, it makes sense to be the default, but thats just my POV). Still introducing this disabled will

take load from the issue queue with those 'i cannot properly reboot my system'
make k3s look better for node restarts - reduce the downtime
let the small distro be fast not only on upgrades but also on reboots. Kind of it's strength.

I would not see k3s being heavilty used in 300 nodes clusters where each box is treated ephimeral anyway, thus 'upgrading a nodes kernel' is more of 'through it away and spin up a new one'. This is just more the rke2 and anothers playground.
Do not get me wrong, that is the exact reason i have k3s in my toolbelt (i like it!) - it slim for small clusters and does not come with the heavy tools a full scale k8s cluster needs.

But that said, again, nodes most probably are not ephemeral in single node clusters or lets put it differently, even if they are, you have to wait 3 minutes to shut down the old box, so you can spin up a new one via cloud init, making a kernel upgrade a 20 minutes task (even with automation), since you will DR the entire k3s state (velero, longhorn restore ... all the DR). Not practical and not helping the downtime.

People thus will restart the box, keeping it's state, regularly and not as an 'incident or by chance / corner case'.

Let's make a non-corner case something we cover in the distro. What do you think?

EugenMayer · 2024-03-18T07:32:49Z

EugenMayer
Mar 18, 2024
Author

I will also note that LH doesn't make a lot of sense to me on single-node setups, as you don't have any of the benefits of data replication. I guess the snapshot stuff is still useful. For most single-node clusters without external storage, I think people usually just use the local-path provisioner.

i cannot disagree more on this one, sorry. In understand, longhorn does add overhead, but it adds so moch

backing local-host is way harder and not at all kubernetes-like. Not starting with restoring it in DR or if you make your box ephemeral in general, IMHO
longhorn does a way better job with k8s and the rwo 'deployments' getting stuck in 'terminating pod one and the new one stuck waiting for the volume' kind of scenrios
if you want to scale up, you can scale up

i used local-path in the past, while using host-path by democratic-csi now days, but rather for the less-ephemeral replacement for 'emptyDir' or for cluster databases where backups are not part of the storage solution.

But in the end, that is the cool about k8s, we do not need to agree of 'a toolbelt', we can pick and chose what ever suits best. If local-path is the one you go for, that's perfect. (i understand that picking can have implications like with longhorn.. that it tries keeping alive the last replica 'harder' which might be an issue for single-node clusters here)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

node-restart takes over 2 minutes to shutdown due to longhorn #9752

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

node-restart takes over 2 minutes to shutdown due to longhorn #9752

EugenMayer Mar 17, 2024

Replies: 2 comments · 4 replies

brandond Mar 18, 2024 Collaborator

EugenMayer Mar 18, 2024 Author

brandond Mar 18, 2024 Collaborator

brandond Mar 18, 2024 Collaborator

EugenMayer Mar 18, 2024 Author

EugenMayer Mar 18, 2024 Author

EugenMayer
Mar 17, 2024

Replies: 2 comments 4 replies

brandond
Mar 18, 2024
Collaborator

EugenMayer Mar 18, 2024
Author

brandond Mar 18, 2024
Collaborator

brandond Mar 18, 2024
Collaborator

EugenMayer Mar 18, 2024
Author

EugenMayer
Mar 18, 2024
Author