-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Skip execution control for agent pod during pod reconciliation. Fixes #12726 #12732
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Looks like the test failures are the common ones which have been happening recently.
Related to: #12730 (comment) |
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this has a use-case for Agent pods -- the Workflow's deadline should still apply to them.
woc.wf.NodeId("agent")
seems to work for them in agent.go
's getAgentPodName()
func
I fixed it, but there are some new failed tests that appear frequently recently.
I've also thought about it. The workflow's deadline can not apply to agent pods since they do not support being terminated now. This is somewhat related to another issue I raised, see #12692 (comment). |
This comment was marked as resolved.
This comment was marked as resolved.
Ah I see, because it doesn't support the Also when you've thought through some of these situations, it's useful to write them down in the PR description, both for historical context and so that reviewers know without even asking. I use a "### Notes to Reviewers" section in my PRs where there are some changes that require an explanation Also just wanted to say, I do appreciate all the investigation and work you've done for the Agent and Plugins! |
Yea these seem to have been really recent, I only noticed them earlier today. I'm not sure which PR caused them though from a look through of the commit history on |
The agent pod will be deleted once the workflow is completed now, so the pod termination may be unnecessary. This PR #12441 has support plugin nodes shutdown, but there are a few unresolved issues. argo-workflows/workflow/controller/operator.go Lines 1265 to 1282 in ccb71bd
I will submit a new PR to fix it. EDIT: See #13016 |
This comment was marked as resolved.
This comment was marked as resolved.
For now I suppose. It's a little bit different timing-wise, but close enough. A comment would still be helpful as it is relevant to #12692 / must change if that is implemented
Agreed
How is TTL related there? Since it's only deadline related |
Sorry, I made a mistake and I actually meant the workflow's deadline. |
Signed-off-by: jswxstw <[email protected]>
@agilgur5 Can this PR be merged? |
This was the only thing I was waiting for. Has actually been on my to-do list to add this myself since I have that permission these days |
I updated the PR information, execution control will apply for nodes of the agent pod here now: argo-workflows/workflow/controller/operator.go Lines 312 to 315 in 735739f
|
This comment was marked as spam.
This comment was marked as spam.
I guess this should be merged in, I haven't reviewed it personally and will hold off on merging it myself until I have done so. Other approvers are free to merge it. |
Fixes #12726
Motivation
Obtain nonexistent node with agent pod in
applyExecutionControl
.Modifications
Skip execution control for agent pod during pod reconciliation, it will be done here now:
argo-workflows/workflow/controller/operator.go
Lines 312 to 315 in 735739f
Verification
local test