-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: compute and emit realtime metrics while node not fulfilled #13441
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Yuping Fan <[email protected]>
Signed-off-by: Yuping Fan <[email protected]>
Signed-off-by: Yuping Fan <[email protected]>
@Joibel Could you take a view for this pr, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unconvinced that when clauses should apply to realtime metrics. It seems bizarre to have a gauge which disappears and reappears. If my car did that I'd think it was broken and want to fix it. My preference would be to warn on when
clauses being used in conjunction with realtime
, and just ignore the when clause.
Given that when clauses do apply, they shouldn't only be evaluated on workflowResyncPeriod
. This PR as is doesn't fix that.
My suggestion for a better fix would be to change the interface to RealTimeValueFunc
to add a boolean return based on the when clause:
type RealTimeValueFunc func() (bool, float64)
and then fix the fallout from this such that the when clause for realtime is not evaluated in ComputeMetrics
and is checked in the func instead.
@Joibel Thanks for review, the point I want to improve is not the calculation method of "when", but if the node will not call If it is a workflow-level metrics, this
|
Why do you want to call it again during operation for realtime metrics? The only reason I can see to call it a second time is because the first time the when clause evaluated to false so the realtime metric didn't get created.
This should be unnecessary, I'm not sure what this does that is useful. It should probably only happen once under the guard of
Why? What is special about a template being scheduled that means realtime metrics should be updated. Its an arbitrary time point. |
yes, that's the reason |
workflow/controller/operator.go
Outdated
// Check if the node was just created, if it was emit realtime metrics. | ||
// Check if the node was just created or not fulfilled, if it was emit realtime metrics. | ||
// If the node did not previously exist, we can infer that it was created during the current operation, emit real time metrics. | ||
if _, ok := woc.preExecutionNodePhases[node.ID]; !ok { | ||
if _, ok := woc.preExecutionNodePhases[node.ID]; !ok || !node.Fulfilled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Joibel What I want to modify is at the template level, my reason is that I specified when duration > 100
, which means that the realtime metric is actually sent out when it is greater than 100 seconds. In fact, this metric will not be collected when it is greater than 100 seconds, because the duration is 0 when the template is just created. When calculated as false, I permanently lost this metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this does not conform to the definition of "when".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My expectation is that I can collect this metrics when the when condition
is met.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you read my suggestion again. I believe it does what you're looking for here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you read my suggestion again. I believe it does what you're looking for here.
I understand, thank you.
This reverts commit 2ac3e5a.
This reverts commit 05a19da.
This reverts commit d741add.
Signed-off-by: Yuping Fan <[email protected]>
Fixes #13440
Motivation
When I use the realtime metrics in templates level. The realtime metrics not emit in every template execution.
Explanation:
Before the modification, the node would only calculate the real-time logic during its first creation. However, if the conditions specified in the "when" clause are not met at that time, it would have to wait until the node finishes running before recalculating the "when" conditions for real-time. This does not align with the intended meaning of real-time.
Modifications
compute and then emit realtime metrics while node not fulfilled
How to test ?
Such as this template, I add a prometheus metrics at templates level, It will emit the metrics when
{{duration}} > 100
.If workflow keeps running, it never stops. When the
workflowResyncPeriod
time is reached, workflow is scheduled again, but because the node is not fulfilled, the indicator cannot be calculated again, resulting in that the indicator cannot be emitted when the condition is fulfilled.My expectation
compute and then emit realtime metrics while node not fulfilled