-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore!: remove legacy patch pods fallback #13100
chore!: remove legacy patch pods fallback #13100
Conversation
@agilgur5 my WIP takeover. I'll finish this at some point this week. |
@agilgur5 Some of these tests are failing because some of the test workflows use the argo service account and the argo service account doesn't have |
Oh. That's surprising, that means these tests have been using the wrong RBAC for some time and relying on the fallback 😬
Yes, especially in the official install manifests as you have them now. The Controller does not need those privileges and so shouldn't have them. |
@agilgur5 yes, the tests use the various quick start manifests depending on the profile you choose. There is an executor role (same as in the docs) that is bound to the default service account. Are you suggesting that I bind the executor role to the argo service account in the e2e manifest mixins? Below is one of the test cases where the argo service account needs the executor permissions. If I'm not mistaken, the template service account (not just the pod service account) needs the executor permissions. Since it is creating a pod resource, it also needs the create pod permissions, which are bound to the argo service account. apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: k8s-resource-tmpl-with-pod-
spec:
serviceAccount: argo
entrypoint: main
templates:
- name: main
serviceAccountName: argo
resource:
action: create
setOwnerReference: true
successCondition: status.phase == Succeeded
failureCondition: status.phase == Failed
manifest: |
apiVersion: v1
kind: Pod
metadata:
generateName: k8s-pod-resource-
spec:
serviceAccountName: argo
containers:
- name: argosay-container
image: argoproj/argosay:v2
command: ["/argosay"]
restartPolicy: Never See https://github.com/argoproj/argo-workflows/actions/runs/9424092511/job/25963802079 I could also create a dedicated service account for these tests with the required permissions, I suppose. |
Ah right of course since quick start.
The latter would be more correct and least privilege (i.e. less possible future bugs with regard to the tests)
The |
I just realized the pod create permissions are bound to the default service account with the pod-manager role. Switching the tests that used the argo service account to use the default service account instead. |
@agilgur5 I think this test is/was wrong? The container doesn't inherit permissions from the executor, right? I changed executor:
serviceAccountName: argo to executor:
serviceAccountName: default The full test func (s *WorkflowSuite) TestContainerTemplateAutomountServiceAccountTokenDisabled() {
s.Given().Workflow(`
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: get-resources-via-container-template-
namespace: argo
spec:
serviceAccountName: argo
automountServiceAccountToken: false
executor:
serviceAccountName: default
entrypoint: main
templates:
- name: main
container:
name: main
image: bitnami/kubectl
command:
- sh
args:
- -c
- |
kubectl get cm
`).
When().
SubmitWorkflow().
WaitForWorkflow(fixtures.ToBeSucceeded, time.Minute*11).
Then().
ExpectWorkflow(func(t *testing.T, metadata *metav1.ObjectMeta, status *wfv1.WorkflowStatus) {
assert.Equal(t, wfv1.WorkflowSucceeded, status.Phase)
})
} The failure workflow_test.go:53: condition never and cannot be met because the workflow is done
Checking expectation get-resources-via-container-template-slp6q
get-resources-via-container-template-slp6q : Failed Error (exit code 1)
workflow_test.go:56:
Error Trace: /home/runner/work/argo-workflows/argo-workflows/test/e2e/workflow_test.go:56
/home/runner/work/argo-workflows/argo-workflows/test/e2e/fixtures/then.go:69
/home/runner/work/argo-workflows/argo-workflows/test/e2e/fixtures/then.go:44
/home/runner/work/argo-workflows/argo-workflows/test/e2e/workflow_test.go:55
Error: Not equal:
expected: "Succeeded"
actual : "Failed"
Diff:
--- Expected
+++ Actual
@@ -1,2 +1,2 @@
-(v1alpha1.WorkflowPhase) (len=9) "Succeeded"
+(v1alpha1.WorkflowPhase) (len=6) "Failed"
Test: TestWorkflowSuite/TestContainerTemplateAutomountServiceAccountTokenDisabled Shouldn't we be expecting the failure? Edit: |
@agilgur5 Ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I totally forgot to add a release note in my initial variant of the PR -- we should add a small one to the 3.6 upgrading.md
and just link to the Workflow RBAC page (where I removed the old RBAC in #12975)
a7e2653
to
e46a675
Compare
This looks good to me, thanks for the work on it @Garett-MacGowan. I was wondering if we could modify the error message you see in the GUI to include a link to the documentation for fixing it as it's really common to see people with incorrect RBAC still. Currently you see
which is probably obvious to you and I, but for most people who hope it'd just work, they'll be confused. It doesn't have to be a part of this PR though. I'll leave it to @agilgur5 to have a look over it. |
@Joibel Thanks. I could add a link to the GUI. Will put it in another PR. |
@Garett-MacGowan, any chance you could bring this up to date so it can be merged? |
Sure, I'll work on it today |
https://argo-workflows.readthedocs.io/en/stable/progress/#self-reporting-progress
This functionality will be affected, we will never be able to set the initial progress of the workflows' pod. So, these lines should be kept, argo-workflows/workflow/controller/workflowpod.go Lines 261 to 271 in 7173a27
|
- the fallback is old and insecure, and the error confuses users as it's not mentioned in the docs (as it's legacy and a fallback) - it's also tech debt that we have to write code around specifically right now - it's no longer needed and hasn't been the main RBAC in a few versions, so remove it in the next minor - remove the Executor code that patches pods - remove the operator code that reads the patched annotations Signed-off-by: Anton Gilgur <[email protected]>
Signed-off-by: Garett MacGowan <[email protected]>
Signed-off-by: Garett MacGowan <[email protected]>
Signed-off-by: Garett MacGowan <[email protected]>
Signed-off-by: Garett MacGowan <[email protected]>
Signed-off-by: Garett MacGowan <[email protected]>
Signed-off-by: Garett MacGowan <[email protected]>
Signed-off-by: Garett MacGowan <[email protected]>
Signed-off-by: Garett MacGowan <[email protected]>
Signed-off-by: Garett MacGowan <[email protected]>
Signed-off-by: Garett MacGowan <[email protected]>
Signed-off-by: Garett MacGowan <[email protected]>
Signed-off-by: Garett MacGowan <[email protected]>
Signed-off-by: Garett MacGowan <[email protected]>
…ixes argoproj#13100 Signed-off-by: Garett MacGowan <[email protected]>
Could you point me to where this is displayed in the UI? It has been a while. Thanks. |
e46a675
to
ac3d68f
Compare
Signed-off-by: Garett MacGowan <[email protected]>
Signed-off-by: Garett MacGowan <[email protected]>
It looks good to me now. By the way, does this document need to be updated since its content is outdated?
|
What in particular is out of date? The pod patches still happen, they just need the RBAC now. See |
Aren’t pod patches removed by the current PR and progress can only be reported to |
Right, sorry. The updates to progress are done via newNode.Progress = result.Progress
...
woc.wf.Status.Nodes.Set(nodeID, *newNode) in How about the following change? Instead of:
We can say:
|
Wait... I just checked the file on my branch. It has already been updated...
|
@agilgur5 I think this is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion, but LGTM
Co-authored-by: Alan Clucas <[email protected]> Signed-off-by: Garett MacGowan <[email protected]>
Thanks @Joibel |
Sorry I forgot to respond to this at the time Alan wrote it and then had not been feeling well for much of the past week. This does not appear in the UI per se, it appears in logs, which can appear in the UI. I did specifically address this with an FAQ entry in #13041 (as listed in the PR description). Alan specifically meant to catch that error and instead return a link to the docs. This line logs a link to the docs, but does not change the end error |
roleRef: | ||
apiGroup: rbac.authorization.k8s.io | ||
kind: Role | ||
name: workflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did this role not exist in the quick start?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it was supposed to be removed / renamed in #7999 (see also this removed file) but got replaced by other files instead if I'm reading correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait... I just checked the file on my branch. It has already been updated...
Re: progress, #12089 modified this and was recently merged (the PR is a year old otherwise). I actually mentioned @jswxstw's #13260 there too; I'm honestly not entirely sure how self-reported progress is supposed to work now and which versions it changed in -- I'm guessing in 3.4, same as WorkflowTaskResults
in general. We should probably backport some of the documentation changes based on that, for instance.
I'm also not sure if the test helper withProgress
and its usage should have been removed here then; it sounds like it perhaps should be modified similarly to use WorkflowTaskResults
and that tested?
} | ||
} | ||
func withProgress(v string) with { return withAnnotation(common.AnnotationKeyProgress, v) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as in this line
assert.Equal(t, wfv1.Progress("50/100"), pod.Progress) | ||
|
||
// mock workflow uses legacy/insecure pod patch | ||
makePodsPhase(ctx, woc, apiv1.PodSucceeded, withAnnotation(common.AnnotationKeyReportOutputsCompleted, "true"), withProgress("100/100")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and these assertions above
PR Takeover #12976. Description is mostly verbatim
Motivation
User "system:serviceaccount:argo:default" cannot patch resource "pods" in API group "" in the namespace "argo"
#12783 and also frequently on Slack. See docs(faq): add note aboutpods
patch
error #13041 for more such examples of confusionModifications
Verification
Tests pass