Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add SSO with SubjectAccessReviews #7193

Closed

Conversation

thesuperzapper
Copy link
Contributor

@thesuperzapper thesuperzapper commented Nov 9, 2021

Closes:

Overview

This PR adds a feature to have argo-server use Kubernetes SubjectAccessReviews for each User request (based on their OIDC claims, like email or sub) and only provide the access defined in RoleBinding and ClusterRoleBinding for that User.
This allows argo server to run with higher permissions than any specific User, and to effectively "impersonate" the required access.

Here is an extract from workflow-controller-configmap.yaml that enables this feature, specifying the email claim for the username:

sso:
  # ...
  impersonate:
    enabled: true
    # one of: {"email", "sub"}
    usernameClaim: "email"
  rbac:
    # this MUST be false
    enabled: false

Standard Kubernetes RBAC is then used to give access based on a user's OIDC email.
The following RBAC gives [email protected] full access to create/delete/etc. workflows in the xyz Namespace (note, we use a RoleBinding, rather than ClusterRoleBinding).

apiVersion: rbac.authorization.k8s.io/v1
# could also be a namespaced `Role`, if needed
kind: ClusterRole
metadata:
  name: argo-superadmin
rules:
  - apiGroups:
      - ""
    resources:
      - events
      - pods
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - argoproj.io
    resources:
      - cronworkflows
      - cronworkflows/finalizers
      - workfloweventbindings
      - workfloweventbindings/finalizers
      - workflows
      - workflows/finalizers
      - workflowtasksets
      - workflowtasksets/finalizers
      - workflowtemplates
      - workflowtemplates/finalizers
      # TIP: `ClusterWorkflowTemplates` are cluster-scoped resources, so only a `ClusterRoleBinding` can give access to them
      # (that means our `RoleBinding/argo-example-binding` effectively ignores the following lines)
      - clusterworkflowtemplates
      - clusterworkflowtemplates/finalizers
    verbs:
      - create
      - delete
      - deletecollection
      - get
      - list
      - patch
      - update
      - watch
apiVersion: rbac.authorization.k8s.io/v1
# could also be a `ClusterRoleBinding`, if needed
kind: RoleBinding
metadata:
  name: argo-example-binding
  # note, this only allows "[email protected]" to create/delete/etc. workflows in the "xyz" namespace
  namespace: xyz
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: argo-superadmin
subjects:
  - kind: User
    # this is the user's `email` claim
    name: "[email protected]"
    apiGroup: rbac.authorization.k8s.io

Changes:

  • Update docs to describe "SSO Impersonate" feature (along with other cleanups of the SSO docs)
  • Add sso.impersonate.enabled and sso.impersonate.usernameClaim configs
  • Preform SubjectAccessReviews based of the User's email or sub claim before EVERY K8S API call
    • NOTE: This affects both the argo-sever, and argo CLI
  • Preform SubjectAccessReviews before accessing workflow archives (to verify that the user would have had access to the workflow)

How it works:

The most complex part is that we inject an http.RoundTripper (called impersonateRoundTripper) into all Kubernetes clients (dynamic, workflow, eventsource, sensor, kubernetes), this internally calls a new client called impersonate.Client which knows the Username of its user, and preforms SubjectAccessReviews that validate if a user is authorized to use a specific Kuberntes API.

The http.RoundTripper intercepts all K8S API calls (like GET /api/v1/pods), and extracts what the user was trying to do based off the URL path and method, and then passes this information to impersonate.Client.AccessReview().

Limitations

  • SubjectAcessReivew require you to explicitly specify groups that the user is in, otherwise it just checks for direct RoleBinding/ClusterRoleBinding against the user.
    • NOTE: This shouldn't be a large issue for most clusters, as direct User RoleBindings/ClusterRoleBindings are more common
    • NOTE: In the future, we can add a feature to extract a list of "group names" for the SubjectAcessReivew by extracting them from an OIDC claim

To Do:

  • I have added some tests, but would like some suggestions as to how to test this end-to-end in the CI/CD

@thesuperzapper
Copy link
Contributor Author

@alexec sorry for secretly working on a semi-large PR, but I think you will like this feature!

Copy link
Contributor

@alexec alexec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me see if I understand this correctly:

If enabled, we use your SSO info to perform a self-subject access review before any API request. This review is based on verb/resource found by parsing the URL. The actual service account executing the request, this appears to be the argo-server account?

The goal, I assume, is to make it easier to configure RBAC, but delegating the set-up to the Kubernetes API server?

@thesuperzapper
Copy link
Contributor Author

@alexec yep, the goal is to delegate access control to K8S RoleBinding/ClusterRoleBinding. Since argo-workflows is just a set of CRD's, it makes sense to give users the same access they would have had thought kubectl when using the argo UI and CLI.

This streamlines many use-cases, ranging from enterprise to Kubeflow, as pretty much everyone gives kubectl access based off RoleBindings/ClusterRoleBindings to User's emails.

PS: I am in Melbourne, Australia timezone if it looks like I am not responding

@alexec
Copy link
Contributor

alexec commented Nov 9, 2021

Did you take a look at Kubernetes impersonation? This is supported in the Golang client, I think you could amend the existing clients and it should just work 🤞:

https://github.com/kubernetes/client-go/blob/master/transport/config.go#L49

I'd like to see if we can find a way to avoid having too much code related to authentication, instead offload as much decision making to Kubernetes. This gives us a stronger security posture.

@thesuperzapper thesuperzapper force-pushed the add-sso-impersonate-new branch from 734fb15 to 9222ce4 Compare November 9, 2021 22:58
@thesuperzapper
Copy link
Contributor Author

@alexec I have updated the PR description with lots more information for reviewers, and have disabled the sso.impersonate.enabled in manifests/quick-start/sso/overlays/workflow-controller-configmap.yaml (which I this was causing the CI to fail).

@thesuperzapper
Copy link
Contributor Author

Did you take a look at Kubernetes impersonation? This is supported in the Golang client, I think you could amend the existing clients and it should just work 🤞:

@alexec impersonation is a very dangerous feature in Kubernetes, its purpose is for super-admins to literally act as specific users, this is problematic for a few reasons:

  • The impersonate RBAC verb is very limited, it is applied to either "a hard-coded set of users" or "literally all users"
  • It allows privilege escalation beyond what the argo-server Pod has, even if using "a hard-coded set of users", then if one of those users has the "admin" ClusterRole, all it takes is a someone finding a way to run code in the argo-server pod to be a cluster-level admin.
  • Most enterprises won't let you use impersonation for the above reasons

In reality, Kubernetes SubjectAccessReviews are perfect for what we are doing, as they are designed for this purpose.
(For example, all of Kubeflow's controllers check RBAC access using SubjectAccessReviews)

PS: I know the approach of using http.RoundTripper may seem a bit hacky, but it's MUCH easier (and less error prone) than updating literally all calls to our Kubernetes clients to include a SubjectAccessReview call, and requires no extra work going forwards as new calls are added.

@alexec
Copy link
Contributor

alexec commented Nov 10, 2021

So it does not "just work". Shame.

@thesuperzapper
Copy link
Contributor Author

I realized I forgot to put an example of a RoleBinding / ClusterRoleBinding in the docs, so my latest commit has added one.

Copy link
Contributor

@alexec alexec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've included some commentary, but the big things are:

  • This needs unit tests.
  • Can you add some information about how you tested it? Can it be tested automatically or does it need manual testing?
  • Does the user need to send-up RBAC for argo-server to be able to create selfsubjectaccessreviews? If so, I think they need to be added.
  • Is there another term other than "impersonate" we could call this? Users will confuse this with Kubernetes impersonation and get the wrong expectations about it.
  • This is probably the most impressive first time contribution I've ever seen.

server/auth/impersonate/config.go Outdated Show resolved Hide resolved
server/auth/impersonate/config.go Outdated Show resolved Hide resolved
server/auth/impersonate/config.go Show resolved Hide resolved
@@ -187,23 +210,71 @@ func (s gatekeeper) getClients(ctx context.Context, req interface{}) (*servertyp
if err != nil {
return nil, nil, status.Error(codes.Unauthenticated, err.Error())
}
if s.ssoIf.IsImpersonateEnabled() && s.ssoIf.IsRBACEnabled() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not needed? should be prevented during start up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While startup will prevent this for end-users, it's still possible to accidentally enable both in unit tests (due to mocks).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if we enforce here, I don't think we need to enforce higher up in the stack, as the error would be caught here. I don't feel strongly about which is changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexec would you prefer that argo-server fails at startup, or an error that is visible in the UI when you try and access something?

return nil, nil, status.Error(codes.PermissionDenied, "not allowed")
}
return clients, claims, nil
}
if s.ssoIf.IsRBACEnabled() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use else if, if you are worried

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment is same as #7193 (comment)

},
)

dynamicClient, err := dynamic.NewForConfig(restConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like code duplication, perhaps an existing method could be used, but we pass in a roundTripper as an argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, but I would also have to refactor ClientForAuthorization, which I don't want to do (trying to reduce the amount of things this PR changes, as its already hard to review).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please refactor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexec I really think this is unnecessary and would make this PR much larger than it needs to be (with a high likelihood of breaking something else, as many things depend on the current signature of the gatekeeper.clientForAuthorization method)

If you really want, I will do it, but I think it will make reviewing this PR harder.

server/auth/impersonate/client.go Show resolved Hide resolved
"Subresource": subresource,
}).Debug(fmt.Printf("SubjectAccessReview - %s", c.username))

review, err := c.kubeClient.AuthorizationV1().SubjectAccessReviews().Create(ctx, &auth.SubjectAccessReview{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should note that if you use this feature, then it will execute 2x the number of API requests it was previously

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't expect this will significantly impact users, as it's only for the argo-server?

Where would you like to include the warning, if you want to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something in the documentation would be fine, you could point out that, as a human must be using it, and humans are slower that automated, it is not expect to impact anyone

server/auth/round_trippers.go Outdated Show resolved Hide resolved
@thesuperzapper
Copy link
Contributor Author

Thanks for reviewing @alexec, see my replies below!


  • This needs unit tests.

I agree in general, but I am not sure where to implement them (feel free to annotate places you think are important to test, but are currently not).

FYI: I think most of the "SSO+RBAC" tests are in gatekeeper_test.go, but TBH those tests don't really check if it would "actually" work, is there a section that does "integration" tests for "SSO+RBAC" that I can use as a reference?


  • Can you add some information about how you tested it? Can it be tested automatically or does it need manual testing?

I tested the argo-server in "local mode" (with make start PROFILE=sso UI=true API=true and manifests/quick-start/sso/overlays/workflow-controller-configmap.yaml updated to set sso.impersonate.enabled to true), and with compiled docker images in a proper cluster (in the context of Kubeflow).

In all my tests I get 403 from the UI/CLI when I don't have the required RBAC, and it works properly when I do.


  • Does the user need to send-up RBAC for argo-server to be able to create selfsubjectaccessreviews? If so, I think they need to be added.

I assume you mean SubjectAccessReviews, and I have fixed this in the manifests/... in 32078ba

Also, do you think we should make sso.impersonate.enabled the default in manifests/quick-start/sso/overlays/workflow-controller-configmap.yaml instead of sso.rbac.enabled as I think the impersonation feature is much more user-friendly?


  • Is there another term other than "impersonate" we could call this? Users will confuse this with Kubernetes impersonation and get the wrong expectations about it.

I am not sure it matters that much, especially as I have tried to be very clear in the updated docs that this feature uses SubjectAccessReviews (it's no more confusing than sso.rbac.enabled, which has nothing to do with K8S RBAC).

However, I guess we could call it "subjectReview, so something like sso.subjectReview.enabled?


  • This is probably the most impressive first time contribution I've ever seen.

Thanks?

@thesuperzapper
Copy link
Contributor Author

@alexec while I have been testing, I have uncovered a more general issue with argo-workflows, which is that the argo-server (specifically the UI), makes an absolutely unreasonable number of get/watch/list calls.

For example, opening the http://localhost:8080/workflows?limit=500 page causes potentially hundreds of calls to the Kuberntes API!!!

All of the xxxx_server.go modules are extremely wasteful with their API calls, by doing things like repeatedly preforming get and watch while processing a single REST request:

You can see for yourself by enabling sso.impersonate.enabled and seeing how many SubjectAccessReview debug logs you see. (Note, I cleaned up the debug/error logging for impersonation in the last commit 02df926)


I don't think this issue should block this feature, but it needs to be a top priority to significantly reduce the number of API calls created by argo-server.

@alexec
Copy link
Contributor

alexec commented Nov 12, 2021

(it's no more confusing than sso.rbac.enabled, which has nothing to do with K8S RBAC).

Yes. In hindsight, should not have called that SSO+RBAC. Having a good name helps users understand a feature and differentiate from similar features. In my mind, we're not impersonating here, nor are we using Kubernetes impersonation. It's own name would give it it's own personality.

This is a closed-door decision, so I'd like to get it right.

Here are some ideas:

  • SSO + impersonation
  • SSO + subject access review
  • SSO + user access controllers

page causes potentially hundreds of calls to the Kuberntes API!!!

This does not sound correct. The API endpoints only ever do O(1) Kubernetes API requests. If users saw this often, I think that we'd know about it. So perhaps there is a bug in the UI under certain circumstances? Perhaps, when an error occurs, we start making API requests repeatedly, but we do not back off? Are you able to shed more light on this?

@thesuperzapper
Copy link
Contributor Author

@alexec what do you need from me to get this merged?

Also, do you think we should try to implement a solution to #6490 (comment) and add the user's groups OIDC claim to the SubjectAccessReview for the first release of this feature, or wait until the next version?

@alexec
Copy link
Contributor

alexec commented Nov 17, 2021

@alexec what do you need from me to get this merged?

Also, do you think we should try to implement a solution to #6490 (comment) and add the user's groups OIDC claim to the SubjectAccessReview for the first release of this feature, or wait until the next version?

Do you think you could reply to my comment from 12th Nov?

@thesuperzapper thesuperzapper force-pushed the add-sso-impersonate-new branch from 6cfc691 to 91e72f3 Compare November 18, 2021 01:48
@thesuperzapper thesuperzapper force-pushed the add-sso-impersonate-new branch from b8d6ef6 to 1795a1f Compare November 18, 2021 07:05
@thesuperzapper
Copy link
Contributor Author

Do you think you could reply to my comment from 12th Nov?

@alexec I propose we name it SSO + Access Review, and use the corresponding sso.accessReview.{enabled,usernameClaim} for the configs.

Regarding the large number of API calls, it's possible that there is some kind of retry "amplification" going on, but there are certainly more requests made to the K8S API that is needed for the UI to function.

Clone the branch, and update sso.impersonate.enabled in ./manifests/quick-start/sso/overlays/workflow-controller-configmap.yam and use make start API=true UI=true PROFILE=sso to start the UI at https://localhost:8080 and you will see the crazy number of SubjectAccessReviews in the logs. (NOTE: You may want to create some RoleBindings for the [email protected] User, as shown in the updated docs)

@alexec
Copy link
Contributor

alexec commented Nov 18, 2021

I think the new name is great. Can you set-up some time in my calendar to we can go over the requests issue? I'd like to understand more

https://bit.ly/book-30m-with-argo-team

@thesuperzapper
Copy link
Contributor Author

@alexec is there any chance we could do a meeting time which is 1-2h later than your current latest one (which is currently 6am in Melbourne, Australia time).

So for example Monday 22th November @ 1pm PT (which is 8am on Tuesday for me)?

@alexec
Copy link
Contributor

alexec commented Nov 30, 2021

@alexec is there any chance we could do a meeting time which is 1-2h later

Sure. I'm in PT, so propose some times and I'll let you know if I'm free?

@thesuperzapper
Copy link
Contributor Author

I am still planning on finishing this, but have not had the time to allocate.

The first step to merging after discussions with @alexec is:

  1. rebase for upstream file changes
  2. rename to sso.accessReview (as sso. impersonate is a confusing name)
  3. explain how users can test sso.accessReview locally with a make start ... command
  4. do a sanity check for the number of UI requests (as raised in feat: add SSO with SubjectAccessReviews #7193 (comment))

@bygui86 even if the initial implementation for this sso.accessReview feature uses http.roundTripper (which is a bit hacky, but works), we can eventually refactor all our K8S calls to be wrapped by an "access review" checker (which is a LOT of work, but cleaner).

@stale stale bot removed the problem/stale This has not had a response in some time label Feb 13, 2022
@stale
Copy link

stale bot commented Feb 22, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the problem/stale This has not had a response in some time label Feb 22, 2022
@thesuperzapper
Copy link
Contributor Author

go away bot

@stale stale bot removed the problem/stale This has not had a response in some time label Feb 23, 2022
@stale
Copy link

stale bot commented Mar 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the problem/stale This has not had a response in some time label Mar 2, 2022
@stale stale bot closed this Mar 12, 2022
@bygui86
Copy link

bygui86 commented Mar 12, 2022

@thesuperzapper any progress on this?

@alexec alexec reopened this Mar 12, 2022
@stale stale bot removed the problem/stale This has not had a response in some time label Mar 12, 2022
@thesuperzapper
Copy link
Contributor Author

@bygui86 haven't had a chance to work on it (but I still believe this is an important feature), the remaining tasks are listed in #7193 (comment).

@alexec
Copy link
Contributor

alexec commented Apr 14, 2022

@thesuperzapper we very much want this feature. I'd love to see it in v3.4 as it'd be a noteworthy feature we'd promote.

Would you like us to try and find someone to complete it on your behalf?

@thesuperzapper
Copy link
Contributor Author

@alexec I can probably get it finished (depending on how quickly you are wanting to get 3.4 out).

I am probably best positioned to get it done quickly, the remaining tasks are listed in #7193 (comment).


The only thing I worry about is the number of API calls that I saw when I was testing this feature (see #7193 (comment)), which seems to be a problem with argo making more requests than it needs, not related to this PR.

@thesuperzapper
Copy link
Contributor Author

But If you want to assign someone I won't feel bad!

I am busy working on lots of things, and this getting added to Argo (whether or not I am the author) will help those other projects!

@stale
Copy link

stale bot commented Apr 25, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the problem/stale This has not had a response in some time label Apr 25, 2022
@alexec alexec added pinned and removed problem/stale This has not had a response in some time labels Apr 25, 2022
@alexec alexec closed this Oct 10, 2022
@bygui86
Copy link

bygui86 commented Oct 11, 2022

@alexec sorry but what's the status here?
You closed the issue without a comment...

@alexec
Copy link
Contributor

alexec commented Oct 11, 2022

Issue abandoned (has been stale for 6mo).

@bygui86
Copy link

bygui86 commented Oct 11, 2022

@alexec oh no :( even after all hard work provided by @thesuperzapper ?

@aaron-arellano
Copy link

Did this feature get abandoned? SAR would be a great feature to introduce as it is native to k8s, we also use this for other services on our clusters.

@agilgur5 agilgur5 added area/sso-rbac problem/stale This has not had a response in some time and removed pinned labels Sep 23, 2023
@agilgur5 agilgur5 changed the title feat: add SSO Impersonate with SubjectAccessReviews feat: add SSO with SubjectAccessReviews Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sso-rbac problem/stale This has not had a response in some time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants