Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calico API Server Crashing on EKS Cluster version 1.30 Node running with Ubuntu 22.04 #9722

Open
evyatar-toledano-cobwebs opened this issue Jan 15, 2025 · 2 comments

Comments

@evyatar-toledano-cobwebs
Copy link

evyatar-toledano-cobwebs commented Jan 15, 2025

Expected Behavior

The Calico API server version 3.29.1 with default configuration as network policy:
Working in EKS 1.29 with Ubuntu 20.04
Expected to work also with EKS 1.30 with Ubuntu 22.04

Current Behavior

The Calico API server (3.29.1) runs on EKS 1.30 with Ubuntu 22.04 for several hours but eventually encounters a permission error:
[webhook.go:253] Failed to make webhook authorizer request.

The calico was installed using helm <helm install calico projectcalico/tigera-operator --namespace tigera-operator>
using default Calico configuration values for the "Policy network only" setup.

Context

The application loses external internet connectivity when the Calico API server crashes and encounters this error.
The issue persists until all pods in the cluster are redeployed to a new node.

Notably, this issue does not occur with:

AWS EKS version 1.29 (Ubuntu 20.04).
The issue appears to be specific to nodes running Ubuntu 22.04.

Your Environment

  • Calico version 3.29.1
  • Orchestrator version AWS EKS version 1.30 / 1.29
  • Operating System and version: Ubuntu 22.04

Additional Notes

The issue seems to be related to the operating system version (Ubuntu 22.04) when running EKS 1.30.
It does not manifest with Ubuntu 20.04 on either EKS 1.29. Based on testing and research, the problem consistently occurs on nodes using Ubuntu 22.04.

Got the following errors:

[webhook.go:253] Failed to make webhook authorizer request: Post "https://172.20.0.1:443/apis/authorization.k8s.io/v1/subjectaccessreviews?timeout=10s": dial tcp 172.20.0.1:443: socket: operation not permitted
[errors.go:77] Post "https://172.20.0.1:443/apis/authorization.k8s.io/v1/subjectaccessreviews?timeout=10s": dial tcp 172.20.0.1:443: socket: operation not permitted
[reflector.go:147] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: Failed to watch *v1.ValidatingWebhookConfiguration: Get "https://172.20.0.1:443/apis/admissionregistration.k8s.io/v1/validatingwebhookconfigurations?allowWatchBookmarks=true&resourceVersion=67881749&timeout=7m3s&timeoutSeconds=423&watch=true": dial tcp 172.20.0.1:443: socket: operation not permitted
[reflector.go:539] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: failed to list *v1.ValidatingWebhookConfiguration: Get "https://172.20.0.1:443/apis/admissionregistration.k8s.io/v1/validatingwebhookconfigurations?resourceVersion=67881749": dial tcp 172.20.0.1:443: socket: operation not permitted
@caseydavenport
Copy link
Member

I can't really see how this could be related to the node image version being used, at least so far as the authorizationreview errors are concerned!

I believe @coutinhop is looking into a similar symptom at the moment - wonder if they are related.

@caseydavenport
Copy link
Member

I suspect this has to do with EKS v1.29 vs. EKS v1.30 - has something changed with the authorizationreview API in that upgrade?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants