-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: DAG with continueOn in error after retry. Fixes: #11395 #12817
Conversation
Signed-off-by: shuangkun <[email protected]>
Signed-off-by: shuangkun <[email protected]>
Signed-off-by: shuangkun <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add something in util_test.go
?
I feel we rely too much on e2e, would this be an issue?
Remember to test if the correct node is retried, it's not included in the current test
Signed-off-by: shuangkun <[email protected]>
Signed-off-by: shuangkun <[email protected]>
I add a ut in util_test.go and test correct node retried. thanks! |
Signed-off-by: shuangkun <[email protected]>
0cde322
to
5ca8b69
Compare
Hi, @tczhao can you take a look again? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Signed-off-by: shuangkun <[email protected]>
5ca8b69
to
05025d2
Compare
05025d2
to
ca25e6a
Compare
Signed-off-by: shuangkun <[email protected]>
47ff6c6
to
028fb1b
Compare
Signed-off-by: shuangkun <[email protected]>
810d0c5
to
31b2889
Compare
Thanks! Modified it! |
Signed-off-by: shuangkun <[email protected]>
31b2889
to
af31b8a
Compare
Signed-off-by: shuangkun <[email protected]>
95f5c32
to
55049ca
Compare
55049ca
to
95f5c32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice tests. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just looked this over, small comment.
@shuangkun you might want to look at #12156. I feel like there's potentially a deeper root cause here with the retry logic being buggy -- see also #12553 (comment)
also wanted to say thanks to @tczhao for the initial review -- you might've noticed earlier but I thumbs-up'd pretty much all of your comments 🙂
OK,I will have a look. |
Signed-off-by: shuangkun <[email protected]> (cherry picked from commit 2eb2415)
This has caused the regression in #13003, so I suggest not backporting until that is resolved because I feel that is a bigger regression than the thing this fixes. |
I really think the root cause I mentioned above #12817 (review) needs a deep dive. There's probably a refactor needed for the manual retry logic to correct all of the issues |
There is an issue with this modification: the failed/error node was retained (with successful child nodes), causing this failed/error node to be unable to retry. |
…rgoproj#12817) Signed-off-by: shuangkun <[email protected]>
fix: DAG with continueOn in error after retry
Fixes: #11395
Motivation
Modifications
Verification
local test and e2e test.
After retry:
Before fix: Error and lose some nodes.
After fix: Failed and not lose nodes