Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cilium NetworkPolicy causes cilium agent crash in AKS #4691

Open
ArkShocer opened this issue Dec 8, 2024 · 1 comment
Open

[BUG] Cilium NetworkPolicy causes cilium agent crash in AKS #4691

ArkShocer opened this issue Dec 8, 2024 · 1 comment

Comments

@ArkShocer
Copy link

ArkShocer commented Dec 8, 2024

Describe the bug

When running an AKS cluster with Cilium and deploying a Cilium NetworkPolicy, the Cilium Agent Pods are going into a crashing state.

To Reproduce

  1. Create an AKS cluster configured with the following:
"networkProfile": {
    "advancedNetworking": null,
    "networkDataplane": "cilium",
    "networkPlugin": "azure",
    "networkPluginMode": null,
    "networkPolicy": "cilium",
    "outboundType": "userDefinedRouting"
  }
  1. Run the following command to test cilium
cilium connectivity test
  1. Run kubectl get pods -n kube-system and see that cilium pods are in a crashing state:
kube-system     azure-cns-bwtlz                       1/1     Running            0                42m
kube-system     azure-cns-crbkf                       1/1     Running            0                42m
kube-system     cilium-bkcgr                          0/1     CrashLoopBackOff   10 (2m36s ago)   41m
kube-system     cilium-operator-5d587985c8-k5968      1/1     Running            0                41m
kube-system     cilium-operator-5d587985c8-lh5v2      1/1     Running            0                41m
kube-system     cilium-x4z78                          0/1     CrashLoopBackOff   5 (2m30s ago)    16m

Environment:

KubernetesVersion 1.30.6

Cilium Test:

📋 Test Report [cilium-test-1]
❌ 4/41 tests failed (3/350 actions), 51 tests skipped, 0 scenarios skipped:
Test [client-egress-to-cidrgroup-deny]:
Test [client-egress-to-cidr-deny-default]:
Test [node-to-node-encryption]:
  ❌ node-to-node-encryption/node-to-node-encryption/ping-ipv4: cilium-test-1/client-7b7776c86b-79xbc (10.100.204.5) -> cilium-test-1/host-netns-mv4m6 (10.100.200.4:0)
Test [check-log-errors]:
  ❌ check-log-errors/no-errors-in-logs/aksname1234567/kube-system/cilium-bkcgr (cilium-agent)
  ❌ check-log-errors/no-errors-in-logs/aksname1234567/kube-system/cilium-x4z78 (cilium-agent)
[cilium-test-1] 4 tests failed

Additional:

The pods could successfully start again after I deleted the Cilium Network Policies from the connectivity test that were still there:
client-egress-to-cidrgroup-deny:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  creationTimestamp: '2024-12-08T23:10:06Z'
  generation: 1
  name: client-egress-to-cidrgroup-deny
  namespace: cilium-test-1
  resourceVersion: '19863'
  uid: f364a8dc-f900-44d4-a925-5fc09f33b80d
  selfLink: >-
    /apis/cilium.io/v2/namespaces/cilium-test-1/ciliumnetworkpolicies/client-egress-to-cidrgroup-deny
spec:
  egressDeny:
    - toCIDRSet:
        - cidrGroupRef: cilium-test-external-cidr
          except:
            - 1.1.1.1/32
  endpointSelector:
    matchLabels:
      any:kind: client

allow-all-egress

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  creationTimestamp: '2024-12-08T23:10:05Z'
  generation: 1
  name: allow-all-egress
  namespace: cilium-test-1
  resourceVersion: '19860'
  uid: dc48bd08-f3c0-47ef-ac20-0c6f9289527a
  selfLink: >-
    /apis/cilium.io/v2/namespaces/cilium-test-1/ciliumnetworkpolicies/allow-all-egress
spec:
  egress:
    - toEndpoints:
        - {}
    - toCIDR:
        - 0.0.0.0/0
  endpointSelector: {}

Pod Log:

time="2024-12-08T23:11:39Z" level=info msg="k8s mode: Allowing localhost to reach local endpoints" subsys=daemon
time="2024-12-08T23:11:39Z" level=info msg="Creating or updating CiliumNode resource" node=aks-workers-21353114-vmss000001 subsys=nodediscovery
time="2024-12-08T23:11:39Z" level=info msg="Direct routing device detected" direct-routing-device=eth0 subsys=linux-datapath
time="2024-12-08T23:11:39Z" level=info msg="Detected devices" devices="[eth0]" subsys=linux-datapath
time="2024-12-08T23:11:39Z" level=info msg="Enabling k8s event listener" subsys=k8s-watcher
time="2024-12-08T23:11:39Z" level=info msg="Removing stale endpoint interfaces" subsys=daemon
time="2024-12-08T23:11:40Z" level=info msg="Skipping kvstore configuration" subsys=daemon
time="2024-12-08T23:11:40Z" level=info msg="Restored router address from node_config" file=/var/run/cilium/state/globals/node_config.h ipv4=169.254.23.0 ipv6="<nil>" subsys=node
time="2024-12-08T23:11:40Z" level=info msg="Initializing node addressing" subsys=daemon
time="2024-12-08T23:11:40Z" level=info msg="Initializing no-op IPAM since we're using a CNI delegated plugin" subsys=ipam
time="2024-12-08T23:11:40Z" level=info msg="The router IP (169.254.23.0) considered for restoration does not belong in the Pod CIDR of the node. Discarding old router IP." cidrs="[10.5.0.0/16]" subsys=node
time="2024-12-08T23:11:40Z" level=info msg="Waiting until local node addressing before starting watchers depending on it" subsys=k8s-watcher
time="2024-12-08T23:11:40Z" level=info msg="Restoring endpoints..." subsys=daemon
time="2024-12-08T23:11:40Z" level=info msg="Policy Add Request" ciliumNetworkPolicy="[&{EndpointSelector:{\"matchLabels\":{\"k8s:app\":\"konnectivity-agent\",\"k8s:io.kubernetes.pod.namespace\":\"kube-system\"}} NodeSelector:{} Ingress:[] IngressDeny:[] Egress:[{EgressCommonRule:{ToEndpoints:[{}] ToRequires:[] ToCIDR: ToCIDRSet:[] ToEntities:[] ToServices:[] ToGroups:[] aggregatedSelectors:[]} ToPorts:[] ToFQDNs:[] ICMPs:[] Authentication:<nil>}] EgressDeny:[] Labels:[k8s:io.cilium.k8s.policy.derived-from=NetworkPolicy k8s:io.cilium.k8s.policy.name=konnectivity-agent k8s:io.cilium.k8s.policy.namespace=kube-system k8s:io.cilium.k8s.policy.uid=adf59d0d-510e-4b85-9c81-86065c6c8b4f] Description:}]" policyAddRequest=d0c89fdd-a97f-4448-98fb-a0a40f0fcdc4 subsys=daemon
time="2024-12-08T23:11:40Z" level=info msg="Policy imported via API, recalculating..." policyAddRequest=d0c89fdd-a97f-4448-98fb-a0a40f0fcdc4 policyRevision=2 subsys=daemon
time="2024-12-08T23:11:40Z" level=info msg="NetworkPolicy successfully added" k8sApiVersion= k8sNetworkPolicyName=konnectivity-agent subsys=k8s-watcher
time="2024-12-08T23:11:40Z" level=info msg="Node updated" clusterName=default nodeName=aks-workers-21353114-vmss000000 subsys=nodemanager
time="2024-12-08T23:11:40Z" level=info msg="Policy Add Request" ciliumNetworkPolicy="[&{EndpointSelector:{\"matchLabels\":{\"k8s:io.kubernetes.pod.namespace\":\"cilium-test-1\"}} NodeSelector:{} Ingress:[] IngressDeny:[] Egress:[{EgressCommonRule:{ToEndpoints:[{\"matchLabels\":{\"k8s:io.kubernetes.pod.namespace\":\"cilium-test-1\"}}] ToRequires:[] ToCIDR: ToCIDRSet:[] ToEntities:[] ToServices:[] ToGroups:[] aggregatedSelectors:[]} ToPorts:[] ToFQDNs:[] ICMPs:[] Authentication:<nil>} {EgressCommonRule:{ToEndpoints:[] ToRequires:[] ToCIDR:[0.0.0.0/0] ToCIDRSet:[] ToEntities:[] ToServices:[] ToGroups:[] aggregatedSelectors:[{LabelSelector:0xc0008b9980 requirements:0xc0005fb350 cachedLabelSelectorString:&LabelSelector{MatchLabels:map[string]string{reserved.world: ,},MatchExpressions:[]LabelSelectorRequirement{},}} {LabelSelector:0xc0008b9d60 requirements:0xc0011427f8 cachedLabelSelectorString:&LabelSelector{MatchLabels:map[string]string{cidr.0.0.0.0/0: ,},MatchExpressions:[]LabelSelectorRequirement{},}}]} ToPorts:[] ToFQDNs:[] ICMPs:[] Authentication:<nil>}] EgressDeny:[] Labels:[k8s:io.cilium.k8s.policy.derived-from=CiliumNetworkPolicy k8s:io.cilium.k8s.policy.name=allow-all-egress k8s:io.cilium.k8s.policy.namespace=cilium-test-1 k8s:io.cilium.k8s.policy.uid=dc48bd08-f3c0-47ef-ac20-0c6f9289527a] Description:}]" policyAddRequest=ab5e803d-1c63-4ad2-ab26-b2aa806a6c71 subsys=daemon
time="2024-12-08T23:11:40Z" level=info msg="Policy imported via API, recalculating..." policyAddRequest=ab5e803d-1c63-4ad2-ab26-b2aa806a6c71 policyRevision=3 subsys=daemon
time="2024-12-08T23:11:40Z" level=info msg="Imported CiliumNetworkPolicy" ciliumNetworkPolicyName=allow-all-egress k8sApiVersion=cilium.io/v2 k8sNamespace=cilium-test-1 subsys=k8s-watcher
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x541120]

goroutine 369 [running]:
net.networkNumberAndMask(0xc00040639c?)
/usr/local/go/src/net/ip.go:425
net.(*IPNet).Contains(0xc00040639c?, {0xc0004063b0, 0x4, 0xc000406398?})
/usr/local/go/src/net/ip.go:449 +0x25
github.com/cilium/cilium/pkg/ip.RemoveCIDRs({0xc001f86630, 0x1, 0x1}, {0xc000dccdf0, 0x1, 0x1})
/go/src/github.com/cilium/cilium/pkg/ip/ip.go:172 +0x16c
github.com/cilium/cilium/pkg/policy/api.ComputeResultantCIDRSet({0xc0012b2480?, 0x369acc0?, 0x1?})
/go/src/github.com/cilium/cilium/pkg/policy/api/cidr.go:157 +0x10e
github.com/cilium/cilium/pkg/policy/api.CIDRRuleSlice.GetAsEndpointSelectors({0xc0012b2480?, 0xc0012b2480?, 0x1?})
/go/src/github.com/cilium/cilium/pkg/policy/api/cidr.go:128 +0x18
github.com/cilium/cilium/pkg/policy/api.(*EgressCommonRule).getAggregatedSelectors(0xc0019ee4b0)
/go/src/github.com/cilium/cilium/pkg/policy/api/egress.go:225 +0x2a9
github.com/cilium/cilium/pkg/policy/api.(*EgressCommonRule).SetAggregatedSelectors(...)
/go/src/github.com/cilium/cilium/pkg/policy/api/egress.go:258
github.com/cilium/cilium/pkg/k8s/apis/cilium.io/utils.parseToCiliumEgressDenyRule({0xc000237db0, 0xd}, {0xc0008b9ee0?, 0xc001142ba0?, {0xc001151ce0?, 0xc001f86b48?}}, {0xc0019ee3c0, 0x1, 0x376ca0e?})
/go/src/github.com/cilium/cilium/pkg/k8s/apis/cilium.io/utils/utils.go:273 +0x3c5
github.com/cilium/cilium/pkg/k8s/apis/cilium.io/utils.ParseToCiliumRule({0xc000237db0, 0xd}, {0xc000e15560, 0x1f}, {0xc0012ef710, 0x24}, 0xc001f995f0)
/go/src/github.com/cilium/cilium/pkg/k8s/apis/cilium.io/utils/utils.go:329 +0x6b0
github.com/cilium/cilium/pkg/k8s/apis/cilium.io/v2.(*CiliumNetworkPolicy).Parse(0xc000464b40)
/go/src/github.com/cilium/cilium/pkg/k8s/apis/cilium.io/v2/cnp_types.go:222 +0x1d0
github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).addCiliumNetworkPolicyV2(0xc00027e008, {0x7f007d592658, 0xc000ac49a0}, 0xc000dccdc8, {0x35cd7c0?, 0x41a438?, 0x5dba740?}, {0xc0019739c0, 0x31})
/go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_network_policy.go:309 +0x2b5
github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).onUpsert(0xc00027e008, 0xc001f87ae8, 0xc001f87eb8, {{0xc000e15560, 0x1f}, {0xc000237db0, 0xd}}, 0xc001f87d38, {0x3d8f848, 0xc000ac49a0}, ...)
/go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_network_policy.go:269 +0x491
github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).ciliumNetworkPoliciesInit.func1()
/go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_network_policy.go:138 +0xca5
created by github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).ciliumNetworkPoliciesInit in goroutine 1
/go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_network_policy.go:89 +0x109

Also, same issue if I enable the ACNS

@jpayne3506
Copy link

Hi @ArkShocer, this has been reported as a bug in cilium/cilium#36494. Recommend that if you are running connectivity tests use Cilium CLI version v0.15.22 as there are potentially breaking changes when using the latest CLI.

Below is a link to cilium/cilium-cli and a modified installation from the page as well. Only difference is specifying the CILIUM_CLI_VERSION
https://github.com/cilium/cilium-cli?tab=readme-ov-file#cilium-cli

CILIUM_CLI_VERSION=v0.15.22
GOOS=$(go env GOOS)
GOARCH=$(go env GOARCH)
curl -L --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-${GOOS}-${GOARCH}.tar.gz{,.sha256sum}
sha256sum --check cilium-${GOOS}-${GOARCH}.tar.gz.sha256sum
sudo tar -C /usr/local/bin -xzvf cilium-${GOOS}-${GOARCH}.tar.gz
rm cilium-${GOOS}-${GOARCH}.tar.gz{,.sha256sum}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants