Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: provide fallback for 3.4 to 3.5 transition with absent NodeFlag. Fixes #12162 #13504

Merged
merged 3 commits into from
Sep 10, 2024

Conversation

isubasinghe
Copy link
Member

@isubasinghe isubasinghe commented Aug 26, 2024

Fixes #12162

Motivation

Modifications

Verification

workflow/common/util.go Outdated Show resolved Hide resolved
@isubasinghe isubasinghe changed the title fix: handle transition case from 3.4 by providing fallback code fix: handle transition case from 3.4 by providing fallback code. Fixess #12162 Aug 27, 2024
@isubasinghe isubasinghe changed the title fix: handle transition case from 3.4 by providing fallback code. Fixess #12162 fix: handle transition case from 3.4 by providing fallback code. Fixes #12162 Aug 27, 2024
@isubasinghe isubasinghe changed the base branch from main to release-3.5 August 27, 2024 23:14
@isubasinghe isubasinghe changed the base branch from release-3.5 to main August 27, 2024 23:14
@isubasinghe isubasinghe marked this pull request as ready for review August 27, 2024 23:15
@agilgur5 agilgur5 changed the title fix: handle transition case from 3.4 by providing fallback code. Fixes #12162 fix: provide fallback for 3.4-3.5 transition with recursion check. Fixes #12162 Aug 28, 2024
@agilgur5 agilgur5 changed the title fix: provide fallback for 3.4-3.5 transition with recursion check. Fixes #12162 fix: provide fallback for 3.4 to 3.5 transition with recursion check. Fixes #12162 Aug 28, 2024
Copy link

@agilgur5 agilgur5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this would simplify the code, but if we detect a 3.4 Workflow, could we just skip the recursion check on it entirely? Since it did not exist in 3.4

@agilgur5 agilgur5 added this to the v3.5.x patches milestone Aug 28, 2024
@isubasinghe isubasinghe force-pushed the fix-infinite-recursion branch from eab519a to 156d36a Compare August 30, 2024 04:20
@isubasinghe isubasinghe changed the base branch from main to release-3.5 August 30, 2024 04:20
@isubasinghe
Copy link
Member Author

I'm not sure if this would simplify the code, but if we detect a 3.4 Workflow, could we just skip the recursion check on it entirely? Since it did not exist in 3.4

Not really this isn't a recursion check, the PR title is incorrect.
3.4 workflows missed a NodeFlag, 3.5 introduced one.

This fix mandates a NodeFlag to be created now, it takes the lack of a NodeFlag to imply older controller to have created the workflow, then it tries to mimic 3.4 logic.

@agilgur5 agilgur5 changed the title fix: provide fallback for 3.4 to 3.5 transition with recursion check. Fixes #12162 fix: provide fallback for 3.4 to 3.5 transition with absent NodeFlag. Fixes #12162 Aug 30, 2024
@agilgur5
Copy link

Not really this isn't a recursion check, the PR title is incorrect.
3.4 workflows missed a NodeFlag, 3.5 introduced one.

I changed the title again. I'm not entirely sure I follow the root cause -- your comments in #12162 don't really outline it either.
That issue is about the recursion check added in #11646. NodeFlag was added in #11839 (i'm not entirely certain it's necessary either, since it seems like it could be derived). Are you saying that #11839 caused an infinite loop somewhere that #11646 then detected?

@isubasinghe isubasinghe force-pushed the fix-infinite-recursion branch from 156d36a to 88f3276 Compare September 4, 2024 10:35
Copy link
Member

@Joibel Joibel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about 3.5 and 3.6 diverging in behaviour with this.

As a less invasive change did you consider scanning the workflow at the beginning of operate to upgrade it?

  • If you find a nodeFlags field, abort scan (early dropout for all updated workflows to avoid the cost
  • Scan each node and calling CheckHookNode generate and attach a nodeFlags to the nodes as appropriate

This scan would then not be needed in 3.6, but it would be the only difference.

workflow/common/util.go Outdated Show resolved Hide resolved
workflow/controller/operator.go Outdated Show resolved Hide resolved
workflow/controller/operator.go Outdated Show resolved Hide resolved
@isubasinghe
Copy link
Member Author

I changed the title again. I'm not entirely sure I follow the root cause -- your comments in #12162 don't really outline it either. That issue is about the recursion check added in #11646. NodeFlag was added in #11839 (i'm not entirely certain it's necessary either, since it seems like it could be derived). Are you saying that #11839 caused an infinite loop somewhere that #11646 then detected?

No worries.

Basically the NodeFlag was added in #11839 along with a function to get all retry nodes by using the NodeFlag field.
The problem is that 3.4 workflows do not have this. This leads to the controller thinking that the retry node was never created. It then throws the controller into an infinite recursion where it tries to executeTemplate recursively.

@isubasinghe
Copy link
Member Author

As a less invasive change did you consider scanning the workflow at the beginning of operate to upgrade it?

No, mostly because the mandating of the NodeFlag effectively provides a blast radius. But I don't think this is needed.
I've re-read the retry logic.

I think I can effectively create a map containing Nodes mapped into their NodeFlags.

Thanks for the suggestion, that is definitely less invasive than this fix.

@isubasinghe isubasinghe force-pushed the fix-infinite-recursion branch from 88f3276 to fd6f03b Compare September 6, 2024 04:31
@isubasinghe
Copy link
Member Author

I am pretty happy with this now actually :)

workflow/controller/operator.go Outdated Show resolved Hide resolved
workflow/common/util.go Show resolved Hide resolved
workflow/common/util.go Outdated Show resolved Hide resolved
workflow/controller/operator.go Outdated Show resolved Hide resolved
workflow/controller/operator.go Show resolved Hide resolved
@isubasinghe isubasinghe requested a review from Joibel September 9, 2024 00:10
workflow/controller/operator.go Show resolved Hide resolved
workflow/controller/operator.go Outdated Show resolved Hide resolved
Signed-off-by: isubasinghe <[email protected]>
@isubasinghe isubasinghe requested a review from Joibel September 9, 2024 22:24
@Joibel Joibel merged commit 930887d into argoproj:release-3.5 Sep 10, 2024
4 checks passed
Joibel pushed a commit that referenced this pull request Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

During upgrade: Maximum recursion depth exceeded error when not using recursion
3 participants