Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List snapshots is an expensive call. We should try to avoid listing all the snapshots.
The purpose for this call here is to ensure the CRD is installed. We don't really need to list the snapshots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your question here: https://kubernetes.slack.com/archives/C0EG7JC6T/p1733776266305359
It's odd that {Limit: 0} would return 10,000 entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, at the very least can we wait for an informer sync of volumesnapshots instead of making a list call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to #504 which added this ensureCustomResourceDefinitionsExist waiting for informer sync was not sufficient bc the controller showed as Ready even before informer had synced, so ensureCustomResourceDefinitionsExist was added to force the controller to immediately crash. So I guess it's not as simple as removing this function , I think the most correct solution would be to implement readyz probe which waited for informer sync. But that is a lot of effort so I think this PR is the best short term fix.
(As for the limit=0 behavior, indeed it is counterintuitive & I could not find where it's documented in apiserver code if at all It seems that since resourceversion is not set in this list call then the lImit=0 option gets plumbed to etcd list call here and it does document that "// If WithLimit is given a 0 limit, it is treated as no limit." https://github.com/etcd-io/etcd/blob/854bdd646c8ce50b879f79f403726c7ab0dc726c/client/v3/op.go#L345. Anyway, the details are unimportant, the point is we observe that limit=0 does the opposite of intended!!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saw that external-provisioner has a healt hz check kubernetes-csi/external-provisioner@52c3575#diff-db1b4e664fc5c6203530702df4e7c6eade0fd481ef5badffbc0bb4a92648d36e. So in short mb snapshot-controller should have also a healthz check which starts serving OK only after informer synced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in the CSI Implementation meeting today and decided we should not go with the "health check" fix. If the CRDs are not deployed, informer will returns errors. It's not going to be synced. So the current fix is good.