-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support CNI STATUS Verb #123
Conversation
b47ffec
to
8f10978
Compare
8a5cf10
to
63c2d24
Compare
122 merged rebase pls .. |
Will do once I get home |
6b3dfc8
to
d96cc3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment..
5762dbb
to
37d969a
Compare
Signed-off-by: Michael Zappa <[email protected]>
37d969a
to
208eca9
Compare
@squeed over here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
containerd 2.0.1 Welcome to the v2.0.1 release of containerd! The first patch release for containerd 2.0 includes a number of bug fixes and improvements. * Fix apply IoOwner options when not in user namespace ([#11151](containerd/containerd#11151)) * Fix cri grpc plugin config migration ([#11140](containerd/containerd#11140)) * Support CNI STATUS Verb ([containerd/go-cni#123](containerd/go-cni#123)) * Update differ to handle zstd media types ([#11068](containerd/containerd#11068)) * Update runc binary to v1.2.3 ([#11142](containerd/containerd#11142)) * Fix panic due to nil dereference cgroups v2 ([#11098](containerd/containerd#11098)) Please try out the release binaries and report any issues at https://github.com/containerd/containerd/issues. * Derek McGowan * Wei Fu * Archit Kulkarni * Jin Dong * Phil Estes * Akhil Mohan * Akihiro Suda * Alexey Lunev * Austin Vazquez * Maksym Pavlenko * Mike Brown * Michael Zappa * Samuel Karp * Sebastiaan van Stijn * Andrey Smirnov * Davanum Srinivas <details><summary>50 commits</summary> <p> * Prepare release notes for v2.0.1 ([#11158](containerd/containerd#11158)) * [`b0ece5dc5`](containerd/containerd@b0ece5d) Prepare release notes for v2.0.1 * build(deps): bump actions/attest-build-provenance from 1.4.4 to 2.1.0 ([#11154](containerd/containerd#11154)) * [`fe6957084`](containerd/containerd@fe69570) build(deps): bump actions/attest-build-provenance from 1.4.4 to 2.1.0 * update xx to v1.6.1 for compatibility with alpine 3.21 and file 5.46+ ([#11153](containerd/containerd#11153)) * [`eb2ce6882`](containerd/containerd@eb2ce68) update xx to v1.6.1 for compatibility with alpine 3.21 and file 5.46+ * ctr pull should unpack for default platform when transfer service is used ([#11139](containerd/containerd#11139)) * [`44cdca68b`](containerd/containerd@44cdca6) ctr pull unpack for default platform using transfer service * Fix apply IoOwner options when not in user namespace ([#11151](containerd/containerd#11151)) * [`018d83650`](containerd/containerd@018d836) internal/cri: should not apply IoOwner options * Update go-cni for CNI STATUS ([#11146](containerd/containerd#11146)) * [`5eb7995a9`](containerd/containerd@5eb7995) feat: update go-cni version for CNI STATUS * Fix cri grpc plugin config migration ([#11140](containerd/containerd#11140)) * [`a2302ea89`](containerd/containerd@a2302ea) Add integration test for custom configuration * [`be5eda069`](containerd/containerd@be5eda0) complete cri grpc config migration * Update runc binary to v1.2.3 ([#11142](containerd/containerd#11142)) * [`a53eff53d`](containerd/containerd@a53eff5) update runc binary to v1.2.3 * Update differ to handle zstd media types ([#11068](containerd/containerd#11068)) * [`73f57acb0`](containerd/containerd@73f57ac) Update differ to handle zstd media types * update to go1.23.4 / go1.22.10 ([#11109](containerd/containerd#11109)) * [`290e8bc70`](containerd/containerd@290e8bc) update to go1.23.4 / go1.22.10 * CI: update Fedora to 41 ([#11110](containerd/containerd#11110)) * [`62b790bfa`](containerd/containerd@62b790b) CI: update Fedora to 41 * Fix panic due to nil dereference cgroups v2 ([#11098](containerd/containerd#11098)) * [`3ba2df924`](containerd/containerd@3ba2df9) fix panic due to nil dereference cgroups v2 * Publish attestation as release artifact ([#11067](containerd/containerd#11067)) * [`34a45cab2`](containerd/containerd@34a45ca) Publish attestation as release artifact * Move rockylinux 9.4 to almalinux/9 in CI ([#11053](containerd/containerd#11053)) * [`7dec6b460`](containerd/containerd@7dec6b4) move rocky 9.4 to almalinux/9 in CI * *: should align pipe's owner with init process ([#11035](containerd/containerd#11035)) * [`cf07f28ee`](containerd/containerd@cf07f28) *: should align pipe's owner with init process * fix: set the credentials even if not provided ([#11031](containerd/containerd#11031)) * [`986088866`](containerd/containerd@9860888) fix: set the credentials even if not provided * fsverity_test.go: fix nil pointer derefence, fix test fail, fix minor/major device numbers resolving ([#10978](containerd/containerd#10978)) * [`30b929ece`](containerd/containerd@30b929e) fsverity_test.go: fix major/minor device number resolving * [`10996a334`](containerd/containerd@10996a3) fsverity_test.go: fix nil pointer dereference, fix test fail * update runc binary to 1.2.2 ([#11023](containerd/containerd#11023)) * [`9081e979f`](containerd/containerd@9081e97) update runc binary to 1.2.2 * Revert "Disable vagrant strict dependency checking" ([#11009](containerd/containerd#11009)) * [`6399c936f`](containerd/containerd@6399c93) Revert "Disable vagrant strict dependency checking" * fsverity_linux.go: Fix fsverity.IsEnabled() for big endian systems ([#11005](containerd/containerd#11005)) * [`a7f2b562f`](containerd/containerd@a7f2b56) fsverity_linux.go: Fix fsverity.IsEnabled() for big endian systems * bump github.com/containerd/typeurl/v2 from 2.2.2 to 2.2.3 ([#10997](containerd/containerd#10997)) * [`389e781ea`](containerd/containerd@389e781) build(deps): bump github.com/containerd/typeurl/v2 from 2.2.2 to 2.2.3 * update to go1.23.3 / go1.22.9 ([#10973](containerd/containerd#10973)) * [`5b879f30c`](containerd/containerd@5b879f3) update to go1.23.3 / go1.22.9 * ci: enable marking 2.0 releases as latest ([#10963](containerd/containerd#10963)) * [`458215f6c`](containerd/containerd@458215f) ci: enable marking 2.0 releases as latest * Avoid arch info in the sed/replace when building cri-cni-containerd.tar.gz ([#10968](containerd/containerd#10968)) * [`e99c2b55c`](containerd/containerd@e99c2b5) Avoid arch info in the sed/replace when building cri-cni-containerd.tar.gz </p> </details> <details><summary>7 commits</summary> <p> * Support CNI STATUS Verb ([containerd/go-cni#123](containerd/go-cni#123)) * [`208eca9`](containerd/go-cni@208eca9) support CNI status verb * Bump github actions dependencies to match containerd CI repo and fix lint ([containerd/go-cni#122](containerd/go-cni#122)) * [`386f475`](containerd/go-cni@386f475) Fix ci.yml indent * [`a9b0675`](containerd/go-cni@a9b0675) Another doc commit to trigger lint? * [`14af454`](containerd/go-cni@14af454) Bump github actions dependency versions * [`9e0d096`](containerd/go-cni@9e0d096) Trivial doc commit to trigger lint </p> </details> * **github.com/containerd/go-cni** v1.1.10 -> v1.1.11 * **github.com/containerd/typeurl/v2** v2.2.2 -> v2.2.3 Previous release can be found at [v2.0.0](https://github.com/containerd/containerd/releases/tag/v2.0.0) * `containerd-<VERSION>-<OS>-<ARCH>.tar.gz`: ✅Recommended. Dynamically linked with glibc 2.31 (Ubuntu 20.04). * `containerd-static-<VERSION>-<OS>-<ARCH>.tar.gz`: Statically linked. Expected to be used on non-glibc Linux distributions. Not position-independent. In addition to containerd, typically you will have to install [runc](https://github.com/opencontainers/runc/releases) and [CNI plugins](https://github.com/containernetworking/plugins/releases) from their official sites too. See also the [Getting Started](https://github.com/containerd/containerd/blob/main/docs/getting-started.md) documentation.
Hey @mikebrow (cc @MikeZappa87), this seems to cause issues with dual-CNI setups. I am using both Cilium + Multus, and during a node reboot; the node reports Ready and then shortly after goes NotReady. I was not seeing this issue with Kubelet just spams these logs when it happens:
|
@MikeZappa87 Yep. |
I thought multus rewrote the cni conf file? The provided doesn’t show that. In the cilium config map what is the value of cni-exclusive this might be more of a cilium issue |
The config.toml is probably messed up for your 2.0.1 containerd.. I'm not seeing the CRI plugin start.. It should look more like this:
|
@mikebrow It is controlled via Talos. /etc/cri/containerd.toml:
/etc/cri/conf.d/cri.toml:
cri logs:
|
well now that you have containerd up you should be able to get kubelet to reconnect |
@mikebrow Once containerd gets in the bugged state, kubelet can never reconnect to it. It spams |
siderolabs/talos#9496 related perhaps |
Just to note: This is only happening when I reboot + have multus enabled. If I disable multus and reboot, the cluster comes up fine. Once the cluster settles... I can enable multus and nothing breaks. So it's a race somewhere on boot with cilium/multus/containerd. |
is possible you have multiple problems.. as the sandbox store is also reporting errors in your cri log... note: containerd/containerd#10848 (comment) |
They happen on every Kubernetes cluster I have ever touched. Just look scary. Any time you reboot a node, it takes a minute or two to settle things. |
we have a config field in cni setup .. setup_serially set that to true ... |
^ this guy |
@mikebrow On it, moment. Update: still broke :( |
anyhow that would eliminate if there is a timing conflict setting them up in parallel |
If adding multus is causing the issue I would try to confirm that it’s a multus issue. One way you can do this is run cri-o, cilium and Multus. And if it doesn’t work I would try and use cri-o and cilium. If that works it’s a Multus issue. If that’s the case I would reach out to them as well |
@MikeZappa87 @mikebrow Testing this code with https://github.com/sasha-s/go-deadlock, I can confirm there is a potential deadlock that was not present in v1.1.10.
|
@mikebrow its probably the double lock in ready? I think one of my original commits had the ready without a lock after the function locks. So ready ... Do something |
Fixes potential deadlock in containerd#123 Signed-off-by: Steven Kreitzer <[email protected]>
@MikeZappa87 The problem is the additional locks added. I'm not entirely sure why you added them, as they were not present before. You pretty much renamed the I removed the additional locks and confirm this removes the potential deadlock here: #125 |
I didn’t add them willingly :-) go ahead and fix the problem. I’m not going to touch code until next year. |
Please see #126 for another attempt to fix this issue, with deadlock tests |
Fixes siderolabs/talos#9984 Patch with containerd/go-cni#126 See also: * containerd/go-cni#125 * containerd/containerd#11186 * containerd/go-cni#123 Signed-off-by: Andrey Smirnov <[email protected]>
Fixes siderolabs/talos#9984 Patch with containerd/go-cni#126 See also: * containerd/go-cni#125 * containerd/containerd#11186 * containerd/go-cni#123 Signed-off-by: Andrey Smirnov <[email protected]> (cherry picked from commit 0b00e86)
We are implementing the CNI Status verb. The Status verb is to provide the container runtime the ability to determine if the runtime should call CNI ADD.
Please merge #122 first
Reference:
ocicni (cri-o/ocicni#196)