Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows pods with services cannot reach outside cluster network (Most of the times), Calico #2362

Closed
MattAxel opened this issue Jan 17, 2022 · 5 comments
Assignees

Comments

@MattAxel
Copy link

MattAxel commented Jan 17, 2022

Pods with services cannot reach outside the cluster network. Standalone pods are working fine. On windows nodes ,calico.

Environmental Info:
RKE2 Version:
rke2.exe version v1.22.5+rke2r1 (ce3e572)
go version go1.16.10b7

Node(s) CPU architecture, OS, and Version:
Caption CSName Version BuildType OSArchitecture


Microsoft Windows Server 2022 Datacenter SAFEPERFKUBW1 10.0.20348 Multiprocessor Free 64-bit

Cluster Configuration:
3 ubuntu 20.04 servers, 2 Win agents. Calico cni plugin
Running on vmware VSphere

Describe the bug:
Pods on the windows nodes cannot partly reach out to external ipadresses. This only applies if a service is created for the deployments. If it is a pod without any service it works fine. Service type does not seem to matter. One instance of each deployment can most of the time reach out externally. For example 3 pods of the same deployment are running on the same node. Only the lastest created are able to reach out externally. This is not always the case but most of the times. When starting up a new pod it always works until the status changes to ready. Guess that is when kubeproxy are updated.
Not sure what to look after in the kubeproxy logs but cannot find any errors...
On the linux nodes it works perfectly fine.

Steps To Reproduce:
Installed using quick start guide for RKE2 and calico cni since using windows agents.
Exec a curl command in a pod

Expected behavior:
Pods should always be able to reach external networks

Actual behavior:
Pods are not able to access outside cluster network if there is a service connected to the deployment (Most of the times, see description)

Additional context / logs:

@rosskirkpat
Copy link
Contributor

Would you be able to provide the rke2 server args/config file that you used?

Are you using one of the pre-configured RKE2 CIS profiles?

Are you expecting external connectivity to be available for the Windows services?

Do you have your internal DNS servers (assuming you have at least one due to VMware vSphere) configured in the coredns config map?

@MattAxel
Copy link
Author

MattAxel commented Jan 25, 2022

Thanks for your reply.

From the RKE2 server one (/etc/rancher/rke2/config.yaml):

tls-san:
- safeperfkubl1
- safeperfkubl1.infra.local
- safeperfkubcl.infra.local
- 172.17.93.211
disable: rke2-ingress-nginx
cni:
- calico

No have not specified any CSI profile

Yes I expecting external connectivity on the windows services. And it works fine in a few pods. But cannot see any pattern more than it looks like it always works until a pod is set to ready. After that it only works in max on instance of each deployment type.

Resolving the names does not seem like a problem. Works fine even in pods without external connectivity.

{
	"Corefile": ".:53 {
		    errors 
		    health  {
		        lameduck 5s
		    }
		    ready 
		    kubernetes   cluster.local  cluster.local in-addr.arpa ip6.arpa {
		        pods insecure
		        fallthrough in-addr.arpa ip6.arpa
		        ttl 30
		    }
		    prometheus   0.0.0.0:9153
		    forward   . /etc/resolv.conf
		    cache   30
		    loop 
		    reload 
		    loadbalance 
		}"
}

Guess it forwards to /etc/resolv.conf and uses 127.0.0.53 in that file. Systemd resolved..
But changed to:

{
...
		    forward   . 172.17.93.2
...
}

(Did not make any difference unfortunately)

@MattAxel
Copy link
Author

Created a new cluster with one control plane node and two windows workers. One worker with win 2019 and one with 2022.
Worked perfectly fine on the win 2019 and got the same issue described above on the win2022....

@phillipsj
Copy link
Contributor

@MattAxel thanks for the update and the additional information.

@caroline-suse-rancher
Copy link
Contributor

Closing this due to age and inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants