Jaeger-all-in-one pod crashes and never gets re-created #4786

iol5045 · 2023-09-27T21:16:27Z

iol5045
Sep 27, 2023

Kubernetes v1.21 Istio v1.13 jaeger-all-in-one v1.29

I've been facing an issue that Jaeger is not coming up properly once it crashes.
(It is not a resource issue based in the metrics that I refer)

Here is the timeline:

Based on the logs, Pilot log says
Incremental push, service jaeger-query.istio-system.svc.cluster.local at shard <VPC_NAME>/Kubernetes has no endpoints
Incremental push, service jaeger-collector.istio-system.svc.cluster.local at shard <VPC_NAME>/Kubernetes has no endpoints
Incremental push, service jaeger-query.istio-system.svc.cluster.local at shard <VPC_NAME>/Kubernetes has no endpoints
And after those logs coming in, Kubelet log says
I0920 21:27:55.875531 4220 scope.go:110] "RemoveContainer" containerID="71e7d0511b5505bb5ced901fcdc8db3e5f99465d7cbe58ccbd48906596c06261"E0920 21:27:55.875968 4220 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"jaeger\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=jaeger pod=jaeger-xxxxx-xxxxx_istio-system(xxxx)\"" pod="istio-system/jaeger-xxxx-xxxxx" podUID=xxxxxx
Until we manually delete the crashing Jaeger pod, it was still emitting Badger logs as below
badger 2023/09/20 21:44:51 INFO: All 2788 tables opened in 431ms
badger 2023/09/20 21:44:51 INFO: Discard stats nextEmptySlot: 0
badger 2023/09/20 21:44:51 INFO: Deleting empty file: /badger/data/000011.vlog
badger 2023/09/20 21:44:51 INFO: Set nextTxnTs to 200993686

Any ideas would be appreciated, and let me know if there is further information needed.
Thank you.