-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: failed to create containerd container: failed to stat parent: stat /var/lib/rancher/rke2/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/214/fs: no such file or directory #4961
Comments
Some things I've tried:
|
Not sure if related to containerd/containerd#3369 After doing this, it seems to work:
|
On the other host if I just delete the container image with crictl, it works as well:
Can this maybe related containerd/containerd#3671 ? |
I have seen also this fail sometimes. It really depends on how the snapshotter was corrupted, and whether or not the layer is shared with any other image. If it's shared, just deleting the one image won't clean it up, you'd need to identify and remove/repull all images that use that layer. Unless you have a surfeit of time on your hands, its usually easier to just nuke the directory and repull what's needed. I'm going to close this, as any issues with this functionality should be tracked with containerd. RKE2 itself is not responsible for ensuring the integrity of the snapshotter filesystem. |
Related fix |
Environmental Info:
RKE2 Version: rke2 version v1.26.9+rke2r1 (368ba42)
go version go1.20.8 X:boringcrypto
Node(s) CPU architecture, OS, and Version:
Cluster Configuration:
Describe the bug:
RKE2 1.26.9 on SLE Micro 5.5... after force killing a node a few times (on purpose, via
sysctl kernel.panic=60; echo 1 > /proc/sys/kernel/sysrq; sleep 1; echo c > /proc/sysrq-trigger &
) I cannot spin up some new containers on those hosts (after they are Ready again)...however, if trying to run nginx it works:
the alpine container however works on the nodes that weren't force killed...
and just in case, the sysctl commands were performed in a 'special' alpine container (via kubectl node-shell) as:
is the alpine image somehow 'corrupted'? how do I clean it? shouldn't be automatically 'cleaned'?
Steps To Reproduce:
sysctl kernel.panic=60; echo 1 > /proc/sys/kernel/sysrq; sleep 1; echo c > /proc/sysrq-trigger &
)The config.yaml file looks like:
Expected behavior:
The alpine container image can be spin up.
Actual behavior:
The alpine container image doesn't work on the nodes being killed after they are recovered.
Additional context / logs:
I've tried to delete the container image just in case but it still fails:
The text was updated successfully, but these errors were encountered: