Propagation SSL Check Failed Only From inside RKE2 VM and Cluster #7433

andisugandi · 2024-12-15T02:21:42Z

Environmental Info:
RKE2 Version:
v1.31.3+rke2r1

Node(s) CPU architecture, OS, and Version:

Node 01:
Linux tw-one 6.11.8-1-default #1 SMP PREEMPT_DYNAMIC Thu Nov 14 12:54:01 UTC 2024 (099023b) x86_64 x86_64 x86_64 GNU/Linux
Node 02:
Linux tw-two 6.11.8-1-default #1 SMP PREEMPT_DYNAMIC Thu Nov 14 12:54:01 UTC 2024 (099023b) x86_64 x86_64 x86_64 GNU/Linux
Node 03:
Linux tw-three 6.11.8-1-default #1 SMP PREEMPT_DYNAMIC Thu Nov 14 12:54:01 UTC 2024 (099023b) x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
3 servers, and 0 agents

Describe the bug:
I tried to deploy Rancher LetsEncrypt-enabled deployment on RKE2 cluster (bare-metal server managed by Harvester-HCI), but it failed to perform self check GET request only from cert-manager pod and from inside the VM ( tw-one, tw-two , and tw-three).

The check GET request will succeed (get response from the cluster) if it's tested from the Internet.

Steps To Reproduce:

Installed RKE2:

NAME               STATUS   ROLES                       AGE   VERSION
tw-one           Ready    control-plane,etcd,master   25d   v1.31.3+rke2r1
tw-three         Ready    control-plane,etcd,master   25d   v1.31.3+rke2r1
tw-two           Ready    control-plane,etcd,master   25d   v1.31.3+rke2r1

Installed helm:

version.BuildInfo{Version:"v3.16.3", GitCommit:"cfd07493f46efc9debd9cc1b02a0961186df7fdf", GitTreeState:"clean", GoVersion:"go1.22.7"}

Active Required Repos:

NAME          	URL                                              
jetstack      	https://charts.jetstack.io                       
rancher-latest	https://releases.rancher.com/server-charts/latest

Installed cert-manager Resources

NAME                                          READY   STATUS    RESTARTS   AGE
pod/cert-manager-b6fd485d9-x9zt6              1/1     Running   0          3h44m
pod/cert-manager-cainjector-dcc5966bc-kpmw8   1/1     Running   0          3h44m
pod/cert-manager-webhook-dfb76c7bd-2vc59      1/1     Running   0          3h44m

NAME                              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)            AGE
service/cert-manager              ClusterIP   10.43.210.60    <none>        9402/TCP           3h44m
service/cert-manager-cainjector   ClusterIP   10.43.203.185   <none>        9402/TCP           3h44m
service/cert-manager-webhook      ClusterIP   10.43.204.212   <none>        443/TCP,9402/TCP   3h44m

NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cert-manager              1/1     1            1           3h44m
deployment.apps/cert-manager-cainjector   1/1     1            1           3h44m
deployment.apps/cert-manager-webhook      1/1     1            1           3h44m

NAME                                                DESIRED   CURRENT   READY   AGE
replicaset.apps/cert-manager-b6fd485d9              1         1         1       3h44m
replicaset.apps/cert-manager-cainjector-dcc5966bc   1         1         1       3h44m
replicaset.apps/cert-manager-webhook-dfb76c7bd      1         1         1       3h44m

Install Rancher with the following command:

kubectl create ns cattle-system

helm install rancher rancher-latest/rancher --namespace cattle-system --set hostname=rancher.awesome.com --set bootstrapPassword=pleas-change-me --set ingress.tls.source=letsEncrypt --set [email protected] --set letsEncrypt.ingress.class=nginx

Running Rancher Resources

NAME                                             READY   STATUS    RESTARTS        AGE
pod/cm-acme-http-solver-lx5pd                    1/1     Running   0               3h40m
pod/rancher-799c68dffd-tz84l                     1/1     Running   1 (3h38m ago)   3h40m
pod/rancher-799c68dffd-x22ll                     1/1     Running   1 (3h38m ago)   3h40m
pod/rancher-799c68dffd-zzwxc                     1/1     Running   1 (3h38m ago)   3h40m
pod/rancher-webhook-c5c58f554-42lpt              1/1     Running   0               3h23m
pod/system-upgrade-controller-5fb67f585d-5x5ms   1/1     Running   0               3h22m

NAME                                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
service/cm-acme-http-solver-rqz9w   NodePort    10.43.192.159   <none>        8089:32287/TCP   3h40m
service/rancher                     ClusterIP   10.43.126.189   <none>        80/TCP,443/TCP   3h40m
service/rancher-webhook             ClusterIP   10.43.19.196    <none>        443/TCP          3h23m

NAME                                        READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/rancher                     3/3     3            3           3h40m
deployment.apps/rancher-webhook             1/1     1            1           3h23m
deployment.apps/system-upgrade-controller   1/1     1            1           3h22m

NAME                                                   DESIRED   CURRENT   READY   AGE
replicaset.apps/rancher-799c68dffd                     3         3         3       3h40m
replicaset.apps/rancher-webhook-c5c58f554              1         1         1       3h23m
replicaset.apps/system-upgrade-controller-5fb67f585d   1         1         1       3h22m

Expected behavior:
The propagation check succeed, the cert-manager pod will get the expected response (code: 200) from the corresponding service.

Actual behavior:
The propagation check failed, the cert-manager pod did not get the expected response from the corresponding service.

Additional context / logs (for security reason, the real domain name and IP Addresses are customized on this reported issue):

The log message from cert-manager pod (from kubectl -n cert-manager logs pod/cert-manager-b6fd485d9-x9zt6 | tail -n 8 command)

E1214 10:51:46.995977       1 sync.go:208] "propagation check failed" err="failed to perform self check GET request 'http://rancher.awesome.com/.well-known/acme-challenge/tsbcERZIDzBDdZijsx83BimfgGIihO1OyxPS-vI2MiY': Get \"http://rancher.awesome.com/.well-known/acme-challenge/tsbcERZIDzBDdZijsx83BimfgGIihO1OyxPS-vI2MiY\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" logger="cert-manager.controller" resource_name="tls-rancher-ingress-1-328897933-2147024768" resource_namespace="cattle-system" resource_kind="Challenge" resource_version="v1" dnsName="rancher.awesome.com" type="HTTP-01"
I1214 10:51:56.999673       1 pod.go:59] "found one existing HTTP01 solver pod" logger="cert-manager.controller.http01.selfCheck.http01.ensurePod" resource_name="tls-rancher-ingress-1-328897933-2147024768" resource_namespace="cattle-system" resource_kind="Challenge" resource_version="v1" dnsName="rancher.awesome.com" type="HTTP-01" related_resource_name="cm-acme-http-solver-lx5pd" related_resource_namespace="cattle-system" related_resource_kind="" related_resource_version=""
I1214 10:51:56.999755       1 service.go:45] "found one existing HTTP01 solver Service for challenge resource" logger="cert-manager.controller.http01.selfCheck.http01.ensureService" resource_name="tls-rancher-ingress-1-328897933-2147024768" resource_namespace="cattle-system" resource_kind="Challenge" resource_version="v1" dnsName="rancher.awesome.com" type="HTTP-01" related_resource_name="cm-acme-http-solver-rqz9w" related_resource_namespace="cattle-system" related_resource_kind="" related_resource_version=""
I1214 10:51:56.999928       1 ingress.go:99] "found one existing HTTP01 solver ingress" logger="cert-manager.controller.http01.selfCheck.http01.ensureIngress" resource_name="tls-rancher-ingress-1-328897933-2147024768" resource_namespace="cattle-system" resource_kind="Challenge" resource_version="v1" dnsName="rancher.awesome.com" type="HTTP-01" related_resource_name="cm-acme-http-solver-xdql8" related_resource_namespace="cattle-system" related_resource_kind="" related_resource_version=""
E1214 10:52:07.000379       1 sync.go:208] "propagation check failed" err="failed to perform self check GET request 'http://rancher.awesome.com/.well-known/acme-challenge/tsbcERZIDzBDdZijsx83BimfgGIihO1OyxPS-vI2MiY': Get \"http://rancher.awesome.com/.well-known/acme-challenge/tsbcERZIDzBDdZijsx83BimfgGIihO1OyxPS-vI2MiY\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" logger="cert-manager.controller" resource_name="tls-rancher-ingress-1-328897933-2147024768" resource_namespace="cattle-system" resource_kind="Challenge" resource_version="v1" dnsName="rancher.awesome.com" type="HTTP-01"
I1214 10:52:17.000800       1 pod.go:59] "found one existing HTTP01 solver pod" logger="cert-manager.controller.http01.selfCheck.http01.ensurePod" resource_name="tls-rancher-ingress-1-328897933-2147024768" resource_namespace="cattle-system" resource_kind="Challenge" resource_version="v1" dnsName="rancher.awesome.com" type="HTTP-01" related_resource_name="cm-acme-http-solver-lx5pd" related_resource_namespace="cattle-system" related_resource_kind="" related_resource_version=""
I1214 10:52:17.000863       1 service.go:45] "found one existing HTTP01 solver Service for challenge resource" logger="cert-manager.controller.http01.selfCheck.http01.ensureService" resource_name="tls-rancher-ingress-1-328897933-2147024768" resource_namespace="cattle-system" resource_kind="Challenge" resource_version="v1" dnsName="rancher.awesome.com" type="HTTP-01" related_resource_name="cm-acme-http-solver-rqz9w" related_resource_namespace="cattle-system" related_resource_kind="" related_resource_version=""
I1214 10:52:17.000902       1 ingress.go:99] "found one existing HTTP01 solver ingress" logger="cert-manager.controller.http01.selfCheck.http01.ensureIngress" resource_name="tls-rancher-ingress-1-328897933-2147024768" resource_namespace="cattle-system" resource_kind="Challenge" resource_version="v1" dnsName="rancher.awesome.com" type="HTTP-01" related_resource_name="cm-acme-http-solver-xdql8" related_resource_namespace="cattle-system" related_resource_kind="" related_resource_version=""

Testing Get Request From Internet by using cURL:

curl -o /dev/null -s -w "%{http_code}\n" http://rancher.awesome.com/.well-known/acme-challenge/tsbcERZIDzBDdZijsx83BimfgGIihO1OyxPS-vI2MiY

The result:

Testing Get Request From Internet by using wget:

wget -O- http://rancher.awesome.com/.well-known/acme-challenge/tsbcERZIDzBDdZijsx83BimfgGIihO1OyxPS-vI2MiY

The result:

--2024-12-14 18:07:27--  http://rancher.awesome.com/.well-known/acme-challenge/tsbcERZIDzBDdZijsx83BimfgGIihO1OyxPS-vI2MiY
Resolving rancher.awesome.com (rancher.awesome.com)... 30.60.220.214
Connecting to rancher.awesome.com (rancher.awesome.com)|30.60.220.214|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 87 [text/plain]
Saving to: ‘STDOUT’

-                            0%[                                       ]       0  --.-KB/s               t-                          100%[======================================>]      87  --.-KB/s    in 0s      

2024-12-14 18:07:27 (13.8 MB/s) - written to stdout [87/87]

Testing Get Request From Internal VM by using cURL:

curl -v http://rancher.awesome.com/.well-known/acme-challenge/tsbcERZIDzBDdZijsx83BimfgGIihO1OyxPS-vI2MiY

The result:

*   Trying 30.60.220.214...
* TCP_NODELAY set
* connect to 30.60.220.214 port 80 failed: Connection timed out
* Failed to connect to rancher.awesome.com port 80: Connection timed out
* Closing connection 0
curl: (7) Failed to connect to rancher.awesome.com port 80: Connection timed out

Testing Get Request From Internal VM by using wget:

wget -O- http://rancher.awesome.com/.well-known/acme-challenge/tsbcERZIDzBDdZijsx83BimfgGIihO1OyxPS-vI2MiY

The result:

--2024-12-14 18:36:36--  http://rancher.awesome.com/.well-known/acme-challenge/tsbcERZIDzBDdZijsx83BimfgGIihO1OyxPS-vI2MiY
Resolving rancher.awesome.com (rancher.awesome.com)... 30.60.220.214
Connecting to rancher.awesome.com (rancher.awesome.com)|30.60.220.214|:80... failed: Connection timed out.
Retrying.

I don't know exactly where to report the issue (to: cert-manager or rancher project), I think it is related to internal RKE2 cluster itself (CMIIW).

Kubernetes-Ingress-Controller-Fake-Certificate-Rancher-RKE2

What did I miss?

Please help, and thank you in advance.

The text was updated successfully, but these errors were encountered:

brandond · 2024-12-15T08:08:56Z

*   Trying 30.60.220.214...
* TCP_NODELAY set
* connect to 30.60.220.214 port 80 failed: Connection timed out

I have no idea what this IP is or how you're hosting it. I don't see it listed anywhere in any of your services, so I am assuming it is external to this cluster? You should figure out why you can't hit this IP from the cluster member VMs. Answering this is not something we can help with here.

If you're using Harvester to host this LB, you may find more help at https://github.com/harvester/harvester

andisugandi · 2024-12-15T23:09:15Z

Hi @brandond ,

Thank you for the helpful insight.

30.60.220.214 is the public IP address (customized for this reported issue) of the rancher.awesome.com. Yes, that is the external of the cluster.

So I need to setup another service / resource to make sure that the cert-manager pod (cluster member VMs) can hit that public IP?

brandond · 2024-12-15T23:52:52Z

As far as I know cert-manager does expect to be able to hit its own challenge URL, yes. You've still not provided any info on how that IP is hosted in your environment so I can't really say much else other than yes, you need to make this work.

brandond closed this as completed Dec 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagation SSL Check Failed Only From inside RKE2 VM and Cluster #7433

Propagation SSL Check Failed Only From inside RKE2 VM and Cluster #7433

andisugandi commented Dec 15, 2024 •

edited

Loading

brandond commented Dec 15, 2024 •

edited

Loading

andisugandi commented Dec 15, 2024

brandond commented Dec 15, 2024

Propagation SSL Check Failed Only From inside RKE2 VM and Cluster #7433

Propagation SSL Check Failed Only From inside RKE2 VM and Cluster #7433

Comments

andisugandi commented Dec 15, 2024 • edited Loading

brandond commented Dec 15, 2024 • edited Loading

andisugandi commented Dec 15, 2024

brandond commented Dec 15, 2024

andisugandi commented Dec 15, 2024 •

edited

Loading

brandond commented Dec 15, 2024 •

edited

Loading