Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when deleting CRD #2205

Closed
cinder-fish opened this issue Sep 22, 2023 · 10 comments
Closed

Crash when deleting CRD #2205

cinder-fish opened this issue Sep 22, 2023 · 10 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@cinder-fish
Copy link

cinder-fish commented Sep 22, 2023

What happened:
kube-state-metrics crashed after a CRD was deleted

What you expected to happen:
to handle the deletion

How to reproduce it (as minimally and precisely as possible):
Delete a CRD such that a cache.DeletedFinalStateUnknown is returned

possibly between intervals?

Anything else we need to know?:
Here's a link to the client doc for this type
https://pkg.go.dev/k8s.io/client-go/tools/cache#DeletedFinalStateUnknown

Environment:

  • kube-state-metrics version: 2.10.0
  • Kubernetes version (use kubectl version): v1.27.4
  • Cloud provider or hardware configuration: GKE
  • Other info:

Error log:

E0922 08:07:20.392009       1 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*runtime._type)(0x188e6c0), concrete:(*runtime._type)(0x19602a0), asserted:(*runtime._type)(0x1af2fe0), missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *unstructured.Unstructured)
goroutine 25 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x18ef280?, 0xc0018dc000})
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0?})
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:49 +0x75
panic({0x18ef280, 0xc0018dc000})
	/usr/local/go-1.20.7/src/runtime/panic.go:884 +0x213
k8s.io/kube-state-metrics/v2/internal/discovery.(*CRDiscoverer).StartDiscovery.func2({0x19602a0?, 0xc001cc0700?})
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/src/k8s.io/kube-state-metrics/internal/discovery/discovery.go:78 +0x495
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnDelete(...)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:257
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:978 +0xaf
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000230738?, {0x1d7cbc0, 0xc000732000}, 0x1, 0xc000730000)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:161
k8s.io/client-go/tools/cache.(*processorListener).run(0xc0000df4d0)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:967 +0x6b
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:72 +0x5a
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:70 +0x85
panic: interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *unstructured.Unstructured [recovered]
	panic: interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *unstructured.Unstructured

goroutine 25 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0?})
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:56 +0xd7
panic({0x18ef280, 0xc0018dc000})
	/usr/local/go-1.20.7/src/runtime/panic.go:884 +0x213
k8s.io/kube-state-metrics/v2/internal/discovery.(*CRDiscoverer).StartDiscovery.func2({0x19602a0?, 0xc001cc0700?})
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/src/k8s.io/kube-state-metrics/internal/discovery/discovery.go:78 +0x495
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnDelete(...)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:257
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:978 +0xaf
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000230738?, {0x1d7cbc0, 0xc000732000}, 0x1, 0xc000730000)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:161
k8s.io/client-go/tools/cache.(*processorListener).run(0xc0000df4d0)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:967 +0x6b
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:72 +0x5a
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:70 +0x85


@cinder-fish cinder-fish added the kind/bug Categorizes issue or PR as related to a bug. label Sep 22, 2023
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Sep 22, 2023
@CatherineF-dev
Copy link
Contributor

QQ: have you enabled custom state resource metrics feature?

@cinder-fish
Copy link
Author

cinder-fish commented Sep 27, 2023

Ah, we have, however it's with the following settings

kind: CustomResourceStateMetrics
spec: {}

As nothing's been templated for it here

And also, apologies, I didn't check the logs far enough back before, with this it also logs the following regularly too

E0927 13:47:06.709384       1 reflector.go:148] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: Failed to watch apiextensions.k8s.io/v1, Resource=customresourcedefinitions: unknown

Which I'm assuming is related

So I guess this is either invalid config or an empty spec could be seen as a no-op?

@CatherineF-dev
Copy link
Contributor

CatherineF-dev commented Sep 27, 2023

Since you are not using CustomResourceStateMetrics feature spec: {}, could you disable this feature and try again?

@cinder-fish
Copy link
Author

Yes, have updated our templating to not include the --custom-resource-state-config flag when this is empty

Can confirm this resolves the log messages and KSM is once again unaffected by CRD changes

@CatherineF-dev
Copy link
Contributor

CatherineF-dev commented Sep 27, 2023

Okay.
I guess the crash might be related to #2202

Discussed with @logicalhan yesterday around moving CustomResourceStateMetrics feature out of KSM repo. This issue is one datapoint which supports this idea.

@CatherineF-dev
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 27, 2023
@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Sep 26, 2024
@dgrisonnet
Copy link
Member

@CatherineF-dev @rexagod do you perhaps know if this is still relevant? IIRC for fix something around that.

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 3, 2024
@rexagod
Copy link
Member

rexagod commented Oct 17, 2024

/close

I tried to reproduce it with the config mentioned above, to no effect.

@k8s-ci-robot
Copy link
Contributor

@rexagod: Closing this issue.

In response to this:

/close

I tried to reproduce it with the config mentioned above, to no effect.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

6 participants