Remote Pod getting deleted when Virtual Kubelet Pod restarts #2748

Sharathmk99 · 2024-09-30T21:30:10Z

What happened:

Virtual Kubelet restarts for some exception(will open new issue for this) and during the startup of the Virtual Kubelet we see some pods under some namespaces are getting deleted in remote cluster. Looks like some race condition.
Please note: Not all pods under affected namespace is deleted.

For example below are the logs of virtual kubelet and grep for pod name(pod-123) under namespace(namespace-123)

I0930 08:18:19.923187       1 reflector.go:317] Pod fallback reflection not yet completely initialized (item: "namespace-123/pod-123")
I0930 08:18:19.924729       1 reflector.go:317] ServiceAccount fallback reflection not yet completely initialized (item: "namespace-123/pod-123")
I0930 08:18:23.308783       1 pod.go:341] Pod "namespace-123/pod-123" successfully marked as Failed (OffloadingAborted)
I0930 08:18:23.700397       1 reflector.go:327] ServiceAccount reflection not yet completely initialized for local namespace "namespace-123" (item: "pod-123")
I0930 08:18:24.013970       1 reflector.go:327] ServiceAccount reflection not yet completely initialized for local namespace "namespace-123" (item: "pod-123")
I0930 08:18:25.502650       1 reflector.go:327] Pod reflection not yet completely initialized for local namespace "namespace-123" (item: "pod-123")
I0930 08:18:42.109607       1 secret.go:102] Skipping reflection of remote Secret "namespace-123/pod-123-token" as containing service account tokens
I0930 08:18:44.904709       1 podns.go:195] Deleting remote shadowpod "namespace-123/pod-123", since local pod "namespace-123/pod-123" has been previously rejected
I0930 08:18:44.913469       1 namespaced.go:97] Remote ShadowPod "namespace-123/pod-123" successfully deleted
I0930 08:18:44.949221       1 podns.go:199] Skipping reflection of local pod "namespace-123/pod-123" as previously rejected
I0930 08:18:47.825376       1 podns.go:199] Skipping reflection of local pod "namespace-123/pod-123" as previously rejected
I0930 08:18:48.713175       1 podns.go:199] Skipping reflection of local pod "namespace-123/pod-123" as previously rejected
I0930 08:18:48.727150       1 podns.go:199] Skipping reflection of local pod "namespace-123/pod-123" as previously rejected
I0930 08:18:48.738589       1 podns.go:199] Skipping reflection of local pod "namespace-123/pod-123" as previously rejected

In reflector.go for some reason it was not able to find namespace namespace-123 and it started printing Failed to retrieve
https://github.com/liqotech/liqo/blob/v0.10.1/pkg/virtualKubelet/reflection/generic/reflector.go#L307

Because of that pod.go in https://github.com/liqotech/liqo/blob/v0.10.1/pkg/virtualKubelet/reflection/workload/pod.go#L334 marked the local pod as Failed

As local pod is marked as Failed podns.go in https://github.com/liqotech/liqo/blob/v0.10.1/pkg/virtualKubelet/reflection/workload/podns.go#L191 deleted the ShadowPod which in turn deleted the Pod in Remote cluster.

Above flow happens only when virtual kubelet pod is restarts. We confirmed by comparing pod restart time vs pod delete time.

What you expected to happen:

No remote pod managed by liqo should get deleted.

How to reproduce it (as minimally and precisely as possible):

It's difficult to reproduce.

Anything else we need to know?:

Note: We have ~200 namespace offloaded to remote cluster and we have ~1500 pods reflected to remote cluster. I'm not sure if scale of namespace and pod what we have is creating the problem.

Environment:

Liqo version: v0.10.1
Liqoctl version: v0.10.1
Kubernetes version (use kubectl version): 1.27
Cloud provider or hardware configuration: Kubeadm
Node image:
Network plugin and version:
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

aleoli · 2024-10-07T12:25:12Z

Hi @Sharathmk99!

Thanks for reporting it! We will try to reproduce it; if you have some way or scenario for easy reproducibility, please share it with us

Sharathmk99 · 2024-10-14T22:15:24Z

Thank you @aleoli. I'm not able to reproduce in smaller cluster.

In our production cluster, it's happening very frequently. If required we can quickly connect over call and debug on cluster directly.

Thanks.

Sharathmk99 added the kind/bug Something isn't working label Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remote Pod getting deleted when Virtual Kubelet Pod restarts #2748

Remote Pod getting deleted when Virtual Kubelet Pod restarts #2748

Sharathmk99 commented Sep 30, 2024

aleoli commented Oct 7, 2024

Sharathmk99 commented Oct 14, 2024

Remote Pod getting deleted when Virtual Kubelet Pod restarts #2748

Remote Pod getting deleted when Virtual Kubelet Pod restarts #2748

Comments

Sharathmk99 commented Sep 30, 2024

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

aleoli commented Oct 7, 2024

Sharathmk99 commented Oct 14, 2024