Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] azure-cns daemonset has bad port value for readinessProbe in 1.27-LTS/AKSUbuntu-2204gen2containerd-202502.09.0 #4834

Open
junkiebev opened this issue Mar 4, 2025 · 1 comment

Comments

@junkiebev
Copy link

junkiebev commented Mar 4, 2025

Describe the bug
We run 1.27.102 (LTS), Azure CNI Overlay, Network Policy Cilium
the azure-cns daemonset in kube-system is deployed and lifecycled automatically by AKS. It fails readiness checks.

spec:
  template:
    spec:
      containers:
      - image: mcr.microsoft.com/containernetworking/azure-cns:v1.4.58
        name: cns-container
        ports:
        - containerPort: 10090
          hostPort: 10090
          name: api
          protocol: TCP
        - containerPort: 10092
          hostPort: 10092
          name: metrics
          protocol: TCP
        readinessProbe:
          failureThreshold: 1
          httpGet:
            path: /readyz
            port: metrics
            scheme: HTTP
          periodSeconds: 1
          successThreshold: 1
          timeoutSeconds: 1

The reason for this is that the /readyz http endpoint answers on the port 10090 (api), however the readinessProbe is pointed at metrics (10092) in the manifest. We're able to validate that changing that fixes it, however the addoncontroller manifests reconciles it with what it holds (the bad configuration).

  • Web requests to http://localhost:10092/readyz return json with the content {"message":"Not Found"} and a http status of 404 (as expected - it's not served on port 10092), which is causing the pod to never become ready.
  • Web requests to http://localhost:10090/readyz return OK and a http status of 200, but that's not how the manifest is being deployed.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy a 1.27-lts AKS cluster using AKSUbuntu-2204gen2containerd-202502.09.0 as the nodepool image and Cilium as the network policy)
  2. kubectl get pods -l k8s-app=azure-cns -n kube-system
  3. the pods never become healthy

Expected behavior
The azure-cns pods pass readinessProbe checks

Environment (please complete the following information):

  • Kubernetes version: 1.27.102-LTS
  • Node Image: AKSUbuntu-2204gen2containerd-202502.09.0

Additional context
This started happening on the 2025-02-20 release, so I think it's a regression

Copy link
Contributor

@chasewilson, @paulgmiller, @wedaly, @quantumn-a5, @tamilmani1989 would you be able to assist?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants