-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rke2 cis profile 1.23, api server can't contact gatekeeper (TLS ERROR) #4910
Comments
So the apiserver is calling the webhook with a 3 second timeout, which is being exceeded.
This pod sees incoming connections being closed; the handshake errors would appear to be because the apiserver is terminating the request when it does not complete after 3 seconds. You said this is the "gatekeeper controller", is this the pod that backs the |
Hello brandond, We are behind a proxy. The files rke2-server and rke2-agent looks the same :
There is no mention of .svc and .cluster.local in CONTAINERD_NO_PROXY env. logs for controller with DEBUG verbosity:
|
I see that you have |
I have disabled kube-proxy to benefit from CILIUM DSR (direct server return). We use that feature since a while. Nothing to say about cilium and dsr so far. It works great. Here the HelmChartConfig :
ENV from api server:
|
Can you confirm that you can curl the endpoint from within the apiserver pod?
If you can't then I suspect your problems are still related to network policy. |
I come back to you, several days later and figured out what was going wrong. I was still having same issues related to api-server and gatekeeper controller. Instead of trying to make a The error from apiserver was due to a mutatingwebhook still present when I removed OPA gatekeeper. I figured out later, that the error disappeared when controler was reachable What makes me lose lots of time is I wrote a bad assign resource. apiVersion: mutations.gatekeeper.sh/v1
kind: Assign
metadata:
name: no-privileged-container
spec:
applyTo:
- groups: ["apps"]
kinds:
- Deployment
- StatefulSet
- DaemonSet
versions: ["v1"]
match:
scope: Cluster
excludedNamespaces:
- kube-system
- cattle-system
- cis-operator-system
- cattle-gatekeeper-system
- metallb-system
location: "spec.template.spec.containers[name:*].securityContext.allowPrivilegeEscalation"
parameters:
assign:
value: false changing scope to Namespaced
If I wrote correctly the assign resource I wouldn't have question cis profile or proxy env or netpol. |
Thanks for the analysis! Since there doesn't appear to be anything wrong with RKE2, I'll close this out. |
Environmental Info:
RKE2 Version:
rke2 version v1.26.9+rke2r1
go version go1.20.8 X: boringcrypto
Node(s) CPU architecture, OS, and Version:
Linux 5.15.0-86-generic #96-Ubuntu SMP x86_64
Cluster Configuration:
7 servers, 3 masters, 4 workers
Describe the bug:
I'm running rke2 with profile: cis-1.23. Kube api server can't contact gatekeeper controller due to handshake TLS error
Steps To Reproduce:
I installed rke2 with tarball.
Here is the config file
Expected behavior:
kube api server should contact gatekeeper webhook wthout error
Actual behavior:
API SERVER Failed calling webhook, failing open validation.gatekeeper.sh
I tried to write ingress and egress netpol to open everything coming to cattle-gatekeeper-system and allow everything to go out from kube-system.
Additional context / logs:
If I perform curl mannually in kube-api-server, no TLS ERROR.
kube-m01:/ # curl --resolve gatekeeper-webhook-service.cattle-gatekeeper-system.svc:443:10.43.195.131 --cacert /tmp/ca.crt https://gatekeeper-webhook-service.cattle-gatekeeper-system.svc 404 page not found
logs from api server
logs from gatekeeper controller
The text was updated successfully, but these errors were encountered: