-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add SSO with SubjectAccessReviews #7193
feat: add SSO with SubjectAccessReviews #7193
Conversation
@alexec sorry for secretly working on a semi-large PR, but I think you will like this feature! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me see if I understand this correctly:
If enabled, we use your SSO info to perform a self-subject access review before any API request. This review is based on verb/resource found by parsing the URL. The actual service account executing the request, this appears to be the argo-server account?
The goal, I assume, is to make it easier to configure RBAC, but delegating the set-up to the Kubernetes API server?
@alexec yep, the goal is to delegate access control to K8S RoleBinding/ClusterRoleBinding. Since argo-workflows is just a set of CRD's, it makes sense to give users the same access they would have had thought This streamlines many use-cases, ranging from enterprise to Kubeflow, as pretty much everyone gives PS: I am in Melbourne, Australia timezone if it looks like I am not responding |
Did you take a look at Kubernetes impersonation? This is supported in the Golang client, I think you could amend the existing clients and it should just work 🤞: https://github.com/kubernetes/client-go/blob/master/transport/config.go#L49 I'd like to see if we can find a way to avoid having too much code related to authentication, instead offload as much decision making to Kubernetes. This gives us a stronger security posture. |
734fb15
to
9222ce4
Compare
@alexec I have updated the PR description with lots more information for reviewers, and have disabled the |
@alexec impersonation is a very dangerous feature in Kubernetes, its purpose is for super-admins to literally act as specific users, this is problematic for a few reasons:
In reality, Kubernetes PS: I know the approach of using |
So it does not "just work". Shame. |
I realized I forgot to put an example of a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've included some commentary, but the big things are:
- This needs unit tests.
- Can you add some information about how you tested it? Can it be tested automatically or does it need manual testing?
- Does the user need to send-up RBAC for argo-server to be able to create
selfsubjectaccessreviews
? If so, I think they need to be added. - Is there another term other than "impersonate" we could call this? Users will confuse this with Kubernetes impersonation and get the wrong expectations about it.
- This is probably the most impressive first time contribution I've ever seen.
@@ -187,23 +210,71 @@ func (s gatekeeper) getClients(ctx context.Context, req interface{}) (*servertyp | |||
if err != nil { | |||
return nil, nil, status.Error(codes.Unauthenticated, err.Error()) | |||
} | |||
if s.ssoIf.IsImpersonateEnabled() && s.ssoIf.IsRBACEnabled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not needed? should be prevented during start up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While startup will prevent this for end-users, it's still possible to accidentally enable both in unit tests (due to mocks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, if we enforce here, I don't think we need to enforce higher up in the stack, as the error would be caught here. I don't feel strongly about which is changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexec would you prefer that argo-server
fails at startup, or an error that is visible in the UI when you try and access something?
return nil, nil, status.Error(codes.PermissionDenied, "not allowed") | ||
} | ||
return clients, claims, nil | ||
} | ||
if s.ssoIf.IsRBACEnabled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use else if, if you are worried
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment is same as #7193 (comment)
}, | ||
) | ||
|
||
dynamicClient, err := dynamic.NewForConfig(restConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like code duplication, perhaps an existing method could be used, but we pass in a roundTripper as an argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, but I would also have to refactor ClientForAuthorization
, which I don't want to do (trying to reduce the amount of things this PR changes, as its already hard to review).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexec I really think this is unnecessary and would make this PR much larger than it needs to be (with a high likelihood of breaking something else, as many things depend on the current signature of the gatekeeper.clientForAuthorization
method)
If you really want, I will do it, but I think it will make reviewing this PR harder.
"Subresource": subresource, | ||
}).Debug(fmt.Printf("SubjectAccessReview - %s", c.username)) | ||
|
||
review, err := c.kubeClient.AuthorizationV1().SubjectAccessReviews().Create(ctx, &auth.SubjectAccessReview{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should note that if you use this feature, then it will execute 2x the number of API requests it was previously
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't expect this will significantly impact users, as it's only for the argo-server?
Where would you like to include the warning, if you want to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something in the documentation would be fine, you could point out that, as a human must be using it, and humans are slower that automated, it is not expect to impact anyone
Thanks for reviewing @alexec, see my replies below!
I agree in general, but I am not sure where to implement them (feel free to annotate places you think are important to test, but are currently not). FYI: I think most of the "SSO+RBAC" tests are in
I tested the argo-server in "local mode" (with In all my tests I get 403 from the UI/CLI when I don't have the required RBAC, and it works properly when I do.
I assume you mean Also, do you think we should make
I am not sure it matters that much, especially as I have tried to be very clear in the updated docs that this feature uses However, I guess we could call it "subjectReview, so something like
Thanks? |
@alexec while I have been testing, I have uncovered a more general issue with argo-workflows, which is that the argo-server (specifically the UI), makes an absolutely unreasonable number of For example, opening the All of the
You can see for yourself by enabling I don't think this issue should block this feature, but it needs to be a top priority to significantly reduce the number of API calls created by argo-server. |
Yes. In hindsight, should not have called that SSO+RBAC. Having a good name helps users understand a feature and differentiate from similar features. In my mind, we're not impersonating here, nor are we using Kubernetes impersonation. It's own name would give it it's own personality. This is a closed-door decision, so I'd like to get it right. Here are some ideas:
This does not sound correct. The API endpoints only ever do O(1) Kubernetes API requests. If users saw this often, I think that we'd know about it. So perhaps there is a bug in the UI under certain circumstances? Perhaps, when an error occurs, we start making API requests repeatedly, but we do not back off? Are you able to shed more light on this? |
@alexec what do you need from me to get this merged? Also, do you think we should try to implement a solution to #6490 (comment) and add the user's |
Do you think you could reply to my comment from 12th Nov? |
Signed-off-by: Mathew Wicks <[email protected]>
Signed-off-by: Mathew Wicks <[email protected]>
Signed-off-by: Mathew Wicks <[email protected]>
Signed-off-by: Mathew Wicks <[email protected]>
Signed-off-by: Mathew Wicks <[email protected]>
Signed-off-by: Mathew Wicks <[email protected]>
6cfc691
to
91e72f3
Compare
Signed-off-by: Mathew Wicks <[email protected]>
b8d6ef6
to
1795a1f
Compare
@alexec I propose we name it Regarding the large number of API calls, it's possible that there is some kind of retry "amplification" going on, but there are certainly more requests made to the K8S API that is needed for the UI to function. Clone the branch, and update |
I think the new name is great. Can you set-up some time in my calendar to we can go over the requests issue? I'd like to understand more |
@alexec is there any chance we could do a meeting time which is 1-2h later than your current latest one (which is currently 6am in Melbourne, Australia time). So for example |
Sure. I'm in PT, so propose some times and I'll let you know if I'm free? |
I am still planning on finishing this, but have not had the time to allocate. The first step to merging after discussions with @alexec is:
@bygui86 even if the initial implementation for this |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
go away bot |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@thesuperzapper any progress on this? |
@bygui86 haven't had a chance to work on it (but I still believe this is an important feature), the remaining tasks are listed in #7193 (comment). |
@thesuperzapper we very much want this feature. I'd love to see it in v3.4 as it'd be a noteworthy feature we'd promote. Would you like us to try and find someone to complete it on your behalf? |
@alexec I can probably get it finished (depending on how quickly you are wanting to get I am probably best positioned to get it done quickly, the remaining tasks are listed in #7193 (comment). The only thing I worry about is the number of API calls that I saw when I was testing this feature (see #7193 (comment)), which seems to be a problem with argo making more requests than it needs, not related to this PR. |
But If you want to assign someone I won't feel bad! I am busy working on lots of things, and this getting added to Argo (whether or not I am the author) will help those other projects! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@alexec sorry but what's the status here? |
Issue abandoned (has been stale for 6mo). |
@alexec oh no :( even after all hard work provided by @thesuperzapper ? |
Did this feature get abandoned? SAR would be a great feature to introduce as it is native to k8s, we also use this for other services on our clusters. |
Closes:
Overview
This PR adds a feature to have argo-server use Kubernetes
SubjectAccessReviews
for each User request (based on their OIDC claims, likeemail
orsub
) and only provide the access defined inRoleBinding
andClusterRoleBinding
for that User.This allows argo server to run with higher permissions than any specific User, and to effectively "impersonate" the required access.
Here is an extract from
workflow-controller-configmap.yaml
that enables this feature, specifying theemail
claim for the username:Standard Kubernetes RBAC is then used to give access based on a user's OIDC
email
.The following RBAC gives
[email protected]
full access to create/delete/etc. workflows in thexyz
Namespace (note, we use aRoleBinding
, rather thanClusterRoleBinding
).Changes:
sso.impersonate.enabled
andsso.impersonate.usernameClaim
configsSubjectAccessReviews
based of the User'semail
orsub
claim before EVERY K8S API callargo-sever
, andargo
CLISubjectAccessReviews
before accessing workflow archives (to verify that the user would have had access to the workflow)How it works:
The most complex part is that we inject an
http.RoundTripper
(calledimpersonateRoundTripper
) into all Kubernetes clients (dynamic
,workflow
,eventsource
,sensor
,kubernetes
), this internally calls a new client calledimpersonate.Client
which knows the Username of its user, and preformsSubjectAccessReviews
that validate if a user is authorized to use a specific Kuberntes API.The
http.RoundTripper
intercepts all K8S API calls (like GET/api/v1/pods
), and extracts what the user was trying to do based off the URL path and method, and then passes this information toimpersonate.Client.AccessReview()
.Limitations
SubjectAcessReivew
require you to explicitly specifygroups
that the user is in, otherwise it just checks for directRoleBinding
/ClusterRoleBinding
against theuser
.RoleBindings
/ClusterRoleBindings
are more commonSubjectAcessReivew
by extracting them from an OIDC claimTo Do: