Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagation SSL Check Failed Only From inside RKE2 VM and Cluster #7433

Closed
andisugandi opened this issue Dec 15, 2024 · 3 comments
Closed

Propagation SSL Check Failed Only From inside RKE2 VM and Cluster #7433

andisugandi opened this issue Dec 15, 2024 · 3 comments

Comments

@andisugandi
Copy link

andisugandi commented Dec 15, 2024

Environmental Info:
RKE2 Version:
v1.31.3+rke2r1

Node(s) CPU architecture, OS, and Version:

  • Node 01:
    Linux tw-one 6.11.8-1-default #1 SMP PREEMPT_DYNAMIC Thu Nov 14 12:54:01 UTC 2024 (099023b) x86_64 x86_64 x86_64 GNU/Linux
  • Node 02:
    Linux tw-two 6.11.8-1-default #1 SMP PREEMPT_DYNAMIC Thu Nov 14 12:54:01 UTC 2024 (099023b) x86_64 x86_64 x86_64 GNU/Linux
  • Node 03:
    Linux tw-three 6.11.8-1-default #1 SMP PREEMPT_DYNAMIC Thu Nov 14 12:54:01 UTC 2024 (099023b) x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
3 servers, and 0 agents

Describe the bug:
I tried to deploy Rancher LetsEncrypt-enabled deployment on RKE2 cluster (bare-metal server managed by Harvester-HCI), but it failed to perform self check GET request only from cert-manager pod and from inside the VM ( tw-one, tw-two , and tw-three).

The check GET request will succeed (get response from the cluster) if it's tested from the Internet.

Steps To Reproduce:

  • Installed RKE2:
    NAME               STATUS   ROLES                       AGE   VERSION
    tw-one           Ready    control-plane,etcd,master   25d   v1.31.3+rke2r1
    tw-three         Ready    control-plane,etcd,master   25d   v1.31.3+rke2r1
    tw-two           Ready    control-plane,etcd,master   25d   v1.31.3+rke2r1
    
  • Installed helm:
    version.BuildInfo{Version:"v3.16.3", GitCommit:"cfd07493f46efc9debd9cc1b02a0961186df7fdf", GitTreeState:"clean", GoVersion:"go1.22.7"}
    
  • Active Required Repos:
NAME          	URL                                              
jetstack      	https://charts.jetstack.io                       
rancher-latest	https://releases.rancher.com/server-charts/latest
  • Installed cert-manager Resources
    NAME                                          READY   STATUS    RESTARTS   AGE
    pod/cert-manager-b6fd485d9-x9zt6              1/1     Running   0          3h44m
    pod/cert-manager-cainjector-dcc5966bc-kpmw8   1/1     Running   0          3h44m
    pod/cert-manager-webhook-dfb76c7bd-2vc59      1/1     Running   0          3h44m
    
    NAME                              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)            AGE
    service/cert-manager              ClusterIP   10.43.210.60    <none>        9402/TCP           3h44m
    service/cert-manager-cainjector   ClusterIP   10.43.203.185   <none>        9402/TCP           3h44m
    service/cert-manager-webhook      ClusterIP   10.43.204.212   <none>        443/TCP,9402/TCP   3h44m
    
    NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/cert-manager              1/1     1            1           3h44m
    deployment.apps/cert-manager-cainjector   1/1     1            1           3h44m
    deployment.apps/cert-manager-webhook      1/1     1            1           3h44m
    
    NAME                                                DESIRED   CURRENT   READY   AGE
    replicaset.apps/cert-manager-b6fd485d9              1         1         1       3h44m
    replicaset.apps/cert-manager-cainjector-dcc5966bc   1         1         1       3h44m
    replicaset.apps/cert-manager-webhook-dfb76c7bd      1         1         1       3h44m
    
  • Install Rancher with the following command:
    kubectl create ns cattle-system
    
    helm install rancher rancher-latest/rancher --namespace cattle-system --set hostname=rancher.awesome.com --set bootstrapPassword=pleas-change-me --set ingress.tls.source=letsEncrypt --set [email protected] --set letsEncrypt.ingress.class=nginx
    
  • Running Rancher Resources
    NAME                                             READY   STATUS    RESTARTS        AGE
    pod/cm-acme-http-solver-lx5pd                    1/1     Running   0               3h40m
    pod/rancher-799c68dffd-tz84l                     1/1     Running   1 (3h38m ago)   3h40m
    pod/rancher-799c68dffd-x22ll                     1/1     Running   1 (3h38m ago)   3h40m
    pod/rancher-799c68dffd-zzwxc                     1/1     Running   1 (3h38m ago)   3h40m
    pod/rancher-webhook-c5c58f554-42lpt              1/1     Running   0               3h23m
    pod/system-upgrade-controller-5fb67f585d-5x5ms   1/1     Running   0               3h22m
    
    NAME                                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
    service/cm-acme-http-solver-rqz9w   NodePort    10.43.192.159   <none>        8089:32287/TCP   3h40m
    service/rancher                     ClusterIP   10.43.126.189   <none>        80/TCP,443/TCP   3h40m
    service/rancher-webhook             ClusterIP   10.43.19.196    <none>        443/TCP          3h23m
    
    NAME                                        READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/rancher                     3/3     3            3           3h40m
    deployment.apps/rancher-webhook             1/1     1            1           3h23m
    deployment.apps/system-upgrade-controller   1/1     1            1           3h22m
    
    NAME                                                   DESIRED   CURRENT   READY   AGE
    replicaset.apps/rancher-799c68dffd                     3         3         3       3h40m
    replicaset.apps/rancher-webhook-c5c58f554              1         1         1       3h23m
    replicaset.apps/system-upgrade-controller-5fb67f585d   1         1         1       3h22m
    

Expected behavior:
The propagation check succeed, the cert-manager pod will get the expected response (code: 200) from the corresponding service.

Actual behavior:
The propagation check failed, the cert-manager pod did not get the expected response from the corresponding service.

Additional context / logs (for security reason, the real domain name and IP Addresses are customized on this reported issue):

  • The log message from cert-manager pod (from kubectl -n cert-manager logs pod/cert-manager-b6fd485d9-x9zt6 | tail -n 8 command)
    E1214 10:51:46.995977       1 sync.go:208] "propagation check failed" err="failed to perform self check GET request 'http://rancher.awesome.com/.well-known/acme-challenge/tsbcERZIDzBDdZijsx83BimfgGIihO1OyxPS-vI2MiY': Get \"http://rancher.awesome.com/.well-known/acme-challenge/tsbcERZIDzBDdZijsx83BimfgGIihO1OyxPS-vI2MiY\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" logger="cert-manager.controller" resource_name="tls-rancher-ingress-1-328897933-2147024768" resource_namespace="cattle-system" resource_kind="Challenge" resource_version="v1" dnsName="rancher.awesome.com" type="HTTP-01"
    I1214 10:51:56.999673       1 pod.go:59] "found one existing HTTP01 solver pod" logger="cert-manager.controller.http01.selfCheck.http01.ensurePod" resource_name="tls-rancher-ingress-1-328897933-2147024768" resource_namespace="cattle-system" resource_kind="Challenge" resource_version="v1" dnsName="rancher.awesome.com" type="HTTP-01" related_resource_name="cm-acme-http-solver-lx5pd" related_resource_namespace="cattle-system" related_resource_kind="" related_resource_version=""
    I1214 10:51:56.999755       1 service.go:45] "found one existing HTTP01 solver Service for challenge resource" logger="cert-manager.controller.http01.selfCheck.http01.ensureService" resource_name="tls-rancher-ingress-1-328897933-2147024768" resource_namespace="cattle-system" resource_kind="Challenge" resource_version="v1" dnsName="rancher.awesome.com" type="HTTP-01" related_resource_name="cm-acme-http-solver-rqz9w" related_resource_namespace="cattle-system" related_resource_kind="" related_resource_version=""
    I1214 10:51:56.999928       1 ingress.go:99] "found one existing HTTP01 solver ingress" logger="cert-manager.controller.http01.selfCheck.http01.ensureIngress" resource_name="tls-rancher-ingress-1-328897933-2147024768" resource_namespace="cattle-system" resource_kind="Challenge" resource_version="v1" dnsName="rancher.awesome.com" type="HTTP-01" related_resource_name="cm-acme-http-solver-xdql8" related_resource_namespace="cattle-system" related_resource_kind="" related_resource_version=""
    E1214 10:52:07.000379       1 sync.go:208] "propagation check failed" err="failed to perform self check GET request 'http://rancher.awesome.com/.well-known/acme-challenge/tsbcERZIDzBDdZijsx83BimfgGIihO1OyxPS-vI2MiY': Get \"http://rancher.awesome.com/.well-known/acme-challenge/tsbcERZIDzBDdZijsx83BimfgGIihO1OyxPS-vI2MiY\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" logger="cert-manager.controller" resource_name="tls-rancher-ingress-1-328897933-2147024768" resource_namespace="cattle-system" resource_kind="Challenge" resource_version="v1" dnsName="rancher.awesome.com" type="HTTP-01"
    I1214 10:52:17.000800       1 pod.go:59] "found one existing HTTP01 solver pod" logger="cert-manager.controller.http01.selfCheck.http01.ensurePod" resource_name="tls-rancher-ingress-1-328897933-2147024768" resource_namespace="cattle-system" resource_kind="Challenge" resource_version="v1" dnsName="rancher.awesome.com" type="HTTP-01" related_resource_name="cm-acme-http-solver-lx5pd" related_resource_namespace="cattle-system" related_resource_kind="" related_resource_version=""
    I1214 10:52:17.000863       1 service.go:45] "found one existing HTTP01 solver Service for challenge resource" logger="cert-manager.controller.http01.selfCheck.http01.ensureService" resource_name="tls-rancher-ingress-1-328897933-2147024768" resource_namespace="cattle-system" resource_kind="Challenge" resource_version="v1" dnsName="rancher.awesome.com" type="HTTP-01" related_resource_name="cm-acme-http-solver-rqz9w" related_resource_namespace="cattle-system" related_resource_kind="" related_resource_version=""
    I1214 10:52:17.000902       1 ingress.go:99] "found one existing HTTP01 solver ingress" logger="cert-manager.controller.http01.selfCheck.http01.ensureIngress" resource_name="tls-rancher-ingress-1-328897933-2147024768" resource_namespace="cattle-system" resource_kind="Challenge" resource_version="v1" dnsName="rancher.awesome.com" type="HTTP-01" related_resource_name="cm-acme-http-solver-xdql8" related_resource_namespace="cattle-system" related_resource_kind="" related_resource_version=""
    
  • Testing Get Request From Internet by using cURL:
    curl -o /dev/null -s -w "%{http_code}\n" http://rancher.awesome.com/.well-known/acme-challenge/tsbcERZIDzBDdZijsx83BimfgGIihO1OyxPS-vI2MiY
    
    The result:
    200
    
  • Testing Get Request From Internet by using wget:
    wget -O- http://rancher.awesome.com/.well-known/acme-challenge/tsbcERZIDzBDdZijsx83BimfgGIihO1OyxPS-vI2MiY
    
    The result:
    --2024-12-14 18:07:27--  http://rancher.awesome.com/.well-known/acme-challenge/tsbcERZIDzBDdZijsx83BimfgGIihO1OyxPS-vI2MiY
    Resolving rancher.awesome.com (rancher.awesome.com)... 30.60.220.214
    Connecting to rancher.awesome.com (rancher.awesome.com)|30.60.220.214|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 87 [text/plain]
    Saving to: ‘STDOUT’
    
    -                            0%[                                       ]       0  --.-KB/s               t-                          100%[======================================>]      87  --.-KB/s    in 0s      
    
    2024-12-14 18:07:27 (13.8 MB/s) - written to stdout [87/87]
    
  • Testing Get Request From Internal VM by using cURL:
    curl -v http://rancher.awesome.com/.well-known/acme-challenge/tsbcERZIDzBDdZijsx83BimfgGIihO1OyxPS-vI2MiY
    
    The result:
    *   Trying 30.60.220.214...
    * TCP_NODELAY set
    * connect to 30.60.220.214 port 80 failed: Connection timed out
    * Failed to connect to rancher.awesome.com port 80: Connection timed out
    * Closing connection 0
    curl: (7) Failed to connect to rancher.awesome.com port 80: Connection timed out
    
  • Testing Get Request From Internal VM by using wget:
    wget -O- http://rancher.awesome.com/.well-known/acme-challenge/tsbcERZIDzBDdZijsx83BimfgGIihO1OyxPS-vI2MiY
    
    The result:
    --2024-12-14 18:36:36--  http://rancher.awesome.com/.well-known/acme-challenge/tsbcERZIDzBDdZijsx83BimfgGIihO1OyxPS-vI2MiY
    Resolving rancher.awesome.com (rancher.awesome.com)... 30.60.220.214
    Connecting to rancher.awesome.com (rancher.awesome.com)|30.60.220.214|:80... failed: Connection timed out.
    Retrying.
    

I don't know exactly where to report the issue (to: cert-manager or rancher project), I think it is related to internal RKE2 cluster itself (CMIIW).

Kubernetes-Ingress-Controller-Fake-Certificate-Rancher-RKE2

What did I miss?

Please help, and thank you in advance.

@brandond
Copy link
Member

brandond commented Dec 15, 2024

*   Trying 30.60.220.214...
* TCP_NODELAY set
* connect to 30.60.220.214 port 80 failed: Connection timed out

I have no idea what this IP is or how you're hosting it. I don't see it listed anywhere in any of your services, so I am assuming it is external to this cluster? You should figure out why you can't hit this IP from the cluster member VMs. Answering this is not something we can help with here.

If you're using Harvester to host this LB, you may find more help at https://github.com/harvester/harvester

@andisugandi
Copy link
Author

Hi @brandond ,

Thank you for the helpful insight.

30.60.220.214 is the public IP address (customized for this reported issue) of the rancher.awesome.com. Yes, that is the external of the cluster.

So I need to setup another service / resource to make sure that the cert-manager pod (cluster member VMs) can hit that public IP?

@brandond
Copy link
Member

As far as I know cert-manager does expect to be able to hit its own challenge URL, yes. You've still not provided any info on how that IP is hosted in your environment so I can't really say much else other than yes, you need to make this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants