Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new pod pending counter metric #13273

Merged
merged 1 commit into from
Aug 19, 2024
Merged

feat: new pod pending counter metric #13273

merged 1 commit into from
Aug 19, 2024

Conversation

Joibel
Copy link
Member

@Joibel Joibel commented Jun 28, 2024

The workflow controller is a kubernetes controller creating pods. Sometimes those pods do not start, and will remain in pending.

This metric counts the number of pods that may have been observed as pending, by namespace and truncated reason. The reason is the first part of the kubernetes pod pending Reason up to the first : if
present.

It ignores all pods in the PodInitializing state as this I consider unremarkable and temporary.

This is intended for users to create alerts on particular reasons or if this metric is climbing unusually rapidly.

Note to reviewers: this is now a standalone commit

@Joibel Joibel force-pushed the opentelemetry-podphase branch from 6729538 to 432d8fc Compare July 1, 2024 07:42
@Joibel Joibel force-pushed the opentelemetry-podpending branch from 4daa3a0 to fe4d153 Compare July 1, 2024 07:42
@Joibel Joibel added area/controller Controller issues, panics area/metrics labels Jul 1, 2024
@agilgur5 agilgur5 changed the title feat: new pod pending counter metric 🪧 [DNM] feat: new pod pending counter metric Jul 1, 2024
@Joibel Joibel force-pushed the opentelemetry-podphase branch from 432d8fc to 4096782 Compare July 5, 2024 10:28
@Joibel Joibel force-pushed the opentelemetry-podpending branch from fe4d153 to 0401646 Compare July 5, 2024 10:28
@Joibel Joibel force-pushed the opentelemetry-podphase branch from 4096782 to 14a762e Compare July 8, 2024 10:06
@Joibel Joibel force-pushed the opentelemetry-podpending branch from 0401646 to 34739c8 Compare July 8, 2024 10:06
@Joibel Joibel force-pushed the opentelemetry-podphase branch from 14a762e to 9cf9291 Compare July 9, 2024 14:27
@Joibel Joibel force-pushed the opentelemetry-podpending branch from 34739c8 to c2aa5eb Compare July 9, 2024 14:27
@Joibel Joibel force-pushed the opentelemetry-podphase branch from 9cf9291 to 55742dd Compare July 12, 2024 08:49
@Joibel Joibel force-pushed the opentelemetry-podpending branch from c2aa5eb to a269114 Compare July 12, 2024 08:49
@Joibel Joibel force-pushed the opentelemetry-podphase branch from 55742dd to bbd57a4 Compare August 12, 2024 11:13
@Joibel Joibel force-pushed the opentelemetry-podpending branch from a269114 to 7980eec Compare August 12, 2024 11:13
@Joibel Joibel force-pushed the opentelemetry-podphase branch from bbd57a4 to 62717f1 Compare August 12, 2024 11:20
@Joibel Joibel force-pushed the opentelemetry-podpending branch from 7980eec to 406bb3a Compare August 12, 2024 11:20
@Joibel Joibel force-pushed the opentelemetry-podphase branch from 62717f1 to 914bffe Compare August 14, 2024 10:41
@Joibel Joibel force-pushed the opentelemetry-podpending branch from 406bb3a to 214ee64 Compare August 14, 2024 10:41
@Joibel Joibel force-pushed the opentelemetry-podphase branch from 914bffe to ed07b58 Compare August 15, 2024 08:04
@Joibel Joibel force-pushed the opentelemetry-podpending branch from 214ee64 to e5ce65c Compare August 15, 2024 08:04
@Joibel Joibel force-pushed the opentelemetry-podphase branch from ed07b58 to b09c0d2 Compare August 15, 2024 14:06
@Joibel Joibel force-pushed the opentelemetry-podpending branch from e5ce65c to 69dfcca Compare August 15, 2024 14:06
@Joibel Joibel force-pushed the opentelemetry-podphase branch from b09c0d2 to 4765408 Compare August 16, 2024 09:09
@Joibel Joibel force-pushed the opentelemetry-podpending branch from 69dfcca to d1a14c8 Compare August 16, 2024 09:09
@Joibel Joibel force-pushed the opentelemetry-podphase branch from 4765408 to e90ca47 Compare August 16, 2024 09:23
@Joibel Joibel force-pushed the opentelemetry-podpending branch from d1a14c8 to f6fa6d5 Compare August 16, 2024 09:24
@Joibel Joibel changed the title 🪧 [DNM] feat: new pod pending counter metric feat: new pod pending counter metric Aug 16, 2024
@Joibel Joibel changed the base branch from opentelemetry-podphase to main August 16, 2024 09:30
Copy link
Member

@isubasinghe isubasinghe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Joibel Joibel enabled auto-merge (squash) August 19, 2024 07:12
@Joibel Joibel force-pushed the opentelemetry-podpending branch from 3bd0249 to fe9fd68 Compare August 19, 2024 08:34
@Joibel Joibel disabled auto-merge August 19, 2024 09:16
The workflow controller is a kubernetes controller creating
pods. Sometimes those pods do not start, and will remain in pending.

This metric counts the number of pods that may have been observed as
pending, by namespace and truncated reason. The reason is the first
part of the kubernetes pod pending `Reason` up to the first `:` if
present.

It ignores all pods in the `PodInitializing` state as this I consider
unremarkable and temporary.

This is intended for users to create alerts on particular `reasons` or
if this metric is climbing unusually rapidly.

Note to reviewers: this is part of a stack of reviews for metrics
changes. Please don't merge until the rest of the stack is also ready.

Signed-off-by: Alan Clucas <[email protected]>
@Joibel Joibel force-pushed the opentelemetry-podpending branch from fe9fd68 to e44a2b9 Compare August 19, 2024 10:02
@Joibel Joibel enabled auto-merge (squash) August 19, 2024 10:03
@Joibel Joibel merged commit dcd9436 into main Aug 19, 2024
28 checks passed
@Joibel Joibel deleted the opentelemetry-podpending branch August 19, 2024 10:45
@agilgur5
Copy link

It ignores all pods in the PodInitializing state as this I consider unremarkable and temporary.

IIRC there are some cases where users had Pods stuck in this state, so it might be useful for detecting those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics area/metrics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants