Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release-1.26] - Race condition when rke2-windows-calico removes and creates an HNS network #5382

Closed
manuelbuil opened this issue Feb 8, 2024 · 1 comment
Assignees

Comments

@manuelbuil
Copy link
Contributor

Backport fix for Race condition when rke2-windows-calico removes and creates an HNS network

@ShylajaDevadiga
Copy link
Contributor

Validated using rke2 version v1.26.14-rc1+rke2r1

Environment Details

Infrastructure
Cloud EC2 instance

Node(s) CPU architecture, OS, and Version:
Ubuntu 22.04
Windows Server 2022

Steps to Reproduce and Validate following the steps in the PR

1 - Deploy rke2 server with calico
2 - Deploy rke2-agent on windows
3 - Once everything is up, Stop-Service rke2 and C:\usr\local\bin\rke2.exe agent service --delete
4 - Verify that there is at least one HNS Network: get-hnsnetwork
5 - Start the rke2-agent on windows again with debug: true (remember to remove the node first or it will complain about password already there)

You should at least see the messages:

 Deleting network: XXXXXX before starting calico"
And

Calico is waiting for the interface with ip: XXXXXX to come back

Created a cluster with 1 server, 1 linux agent, 1 windows agent

Reproduction results on rke2 version v1.26.13+rke2r1

ubuntu@ip-172-31-0-64:~$ rke2 -v
rke2 version v1.26.13+rke2r1 (637e8a38334f603b60650b30547252a5c461fa0d)
go version go1.20.13 X:boringcrypto
ubuntu@ip-172-31-0-64:~$ kubectl get nodes
NAME                                         STATUS   ROLES                       AGE     VERSION
ip-172-31-0-64.us-east-2.compute.internal    Ready    control-plane,etcd,master   6h16m   v1.26.13+rke2r1
ip-172-31-9-212.us-east-2.compute.internal   Ready    <none>                      6h14m   v1.26.13+rke2r1
ip-ac1f2610                                  Ready    <none>                      6h11m   v1.26.13
ubuntu@ip-172-31-0-64:~$ kubectl get nodes
NAME                                         STATUS     ROLES                       AGE     VERSION
ip-172-31-0-64.us-east-2.compute.internal    Ready      control-plane,etcd,master   6h24m   v1.26.13+rke2r1
ip-172-31-9-212.us-east-2.compute.internal   Ready      <none>                      6h22m   v1.26.13+rke2r1
ip-ac1f2610                                  NotReady   <none>                      6h19m   v1.26.13
ubuntu@ip-172-31-0-64:~$ kubectl delete node ip-ac1f2610
node "ip-ac1f2610" deleted
ubuntu@ip-172-31-0-64:~$ 
ubuntu@ip-172-31-0-64:~$ kubectl get nodes
NAME                                         STATUS   ROLES                       AGE     VERSION
ip-172-31-0-64.us-east-2.compute.internal    Ready    control-plane,etcd,master   6h30m   v1.26.13+rke2r1
ip-172-31-9-212.us-east-2.compute.internal   Ready    <none>                      6h28m   v1.26.13+rke2r1
ubuntu@ip-172-31-0-64:~$

Logs:

time="2024-02-20T23:20:10Z" level=debug msg="hcsshim::HNSNetwork::Delete id=F6E78733-A75A-4F99-8832-1526659F4173"
time="2024-02-20T23:20:10Z" level=debug msg="[DELETE]=>[/networks/F6E78733-A75A-4F99-8832-1526659F4173] Request : "
time="2024-02-20T23:20:10Z" level=debug msg="hcsshim::HNSNetwork::Delete id=5734632C-8C77-4F5F-ACEC-6B8A07AA208C"
time="2024-02-20T23:20:10Z" level=debug msg="[DELETE]=>[/networks/5734632C-8C77-4F5F-ACEC-6B8A07AA208C] Request : "
time="2024-02-20T23:20:11Z" level=debug msg="evaluating if the interface: Ethernet with addresses [fe80::95b3:fdf2:7573:379d/64], contains ip: 172.31.3.9"
time="2024-02-20T23:20:11Z" level=debug msg="evaluating if the interface: Loopback Pseudo-Interface 1 with addresses [::1/128 127.0.0.1/8], contains ip: 172.31.3.9"
time="2024-02-20T23:20:11Z" level=debug msg="evaluating if the interface: vEthernet (nat) with addresses [fe80::81fb:6ad0:a3d3:4889/64 172.31.208.1/20], contains ip: 172.31.3.9"
time="2024-02-20T23:20:11Z" level=fatal msg="no interface has the ip: 172.31.3.9"
PS C:\Users\Administrator>

Validation results on rke2 version v1.26.14-rc1+rke2r1

ubuntu@ip-172-31-8-180:~$ rke2 -v
rke2 version v1.26.14-rc1+rke2r1 (84264b99c14cf8626ee34120fac158e64b58b7e8)
go version go1.21.7 X:boringcrypto
ubuntu@ip-172-31-8-180:~$ kubectl get nodes
NAME                                         STATUS   ROLES                       AGE     VERSION
ip-172-31-7-58.us-east-2.compute.internal    Ready    <none>                      6h10m   v1.26.14+rke2r1
ip-172-31-8-180.us-east-2.compute.internal   Ready    control-plane,etcd,master   6h13m   v1.26.14+rke2r1
ip-ac1f2610                                  Ready    <none>                      6h8m    v1.26.14
ubuntu@ip-172-31-8-180:~$ kubectl get nodes
NAME                                         STATUS     ROLES                       AGE     VERSION
ip-172-31-7-58.us-east-2.compute.internal    Ready      <none>                      6h17m   v1.26.14+rke2r1
ip-172-31-8-180.us-east-2.compute.internal   Ready      control-plane,etcd,master   6h21m   v1.26.14+rke2r1
ip-ac1f2610                                  NotReady   <none>                      6h15m   v1.26.14
ubuntu@ip-172-31-8-180:~$ kubectl delete node ip-ac1f2610
node "ip-ac1f2610" deleted
ubuntu@ip-172-31-8-180:~$ kubectl get nodes
NAME                                         STATUS   ROLES                       AGE     VERSION
ip-172-31-7-58.us-east-2.compute.internal    Ready    <none>                      6h22m   v1.26.14+rke2r1
ip-172-31-8-180.us-east-2.compute.internal   Ready    control-plane,etcd,master   6h26m   v1.26.14+rke2r1
ip-ac1f2610                                  Ready    <none>                      69s     v1.26.14
ubuntu@ip-172-31-8-180:~$ 

Logs:

time="2024-02-20T23:19:25Z" level=debug msg="Deleting network: External before starting calico"
time="2024-02-20T23:19:25Z" level=debug msg="hcsshim::HNSNetwork::Delete id=500A6E86-90BF-4CB5-B430-B43AE6489AB0"
time="2024-02-20T23:19:25Z" level=debug msg="[DELETE]=>[/networks/500A6E86-90BF-4CB5-B430-B43AE6489AB0] Request : "
time="2024-02-20T23:19:25Z" level=debug msg="Deleting network: Calico before starting calico"
time="2024-02-20T23:19:25Z" level=debug msg="hcsshim::HNSNetwork::Delete id=28B56261-AD0A-4F3E-9658-3147D5AAD182"
time="2024-02-20T23:19:25Z" level=debug msg="[DELETE]=>[/networks/28B56261-AD0A-4F3E-9658-3147D5AAD182] Request : "
time="2024-02-20T23:19:28Z" level=debug msg="Calico is waiting for the interface with ip: 172.31.7.125 to come back"
time="2024-02-20T23:19:28Z" level=debug msg="evaluating if the interface: Ethernet with addresses [2600:1f16:1d38:1c00:ed24:1870:112f:e59/128 fe80::ecd4:35b7:66b:7df6/64 172.31.7.125/20], contains ip: 172.31.7.125"
time="2024-02-20T23:19:28Z" level=debug msg="Calico is waiting for the interface with ip: 172.31.7.125 to come back"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants