-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine-grained RBAC for Argo Server #6490
Comments
I would like to add onto this. In our setup, each "team" is given a namespace, and each "role" within that team (developer, admin, etc) for that team within our SSO is mapped onto a k8s service account, and given a role and rolebinding to match. The issue arises when a user belongs to more than one team, there's no way to give them multiple service accounts, so the only options are to:
I hope you'll agree none of these are acceptable outcomes. We agree that using k8s-native RBAC makes a lot of sense for a k8s-native app, so here are some potential fixes we've thought up: First, consider adding an annotation to service accounts, that, if present, indicates that all service accounts that match the criteria are selected, instead of just the first according to priority. From there, each SA would be tried, by order of priority. If a "permission denied"-type error is received, the next would be tried. If the list is exhausted, if the request succeeds, or a non-"permission denied"-type error is received, the current behavior would take place. While this would multiply the time each request takes by the number of matched SA's, I think it would allow any arbitrary combination of roles (assuming one role per SA) across namespaces to be assigned to any given user. Another possibility would be to add an annotation to service accounts to indicate that the service account only be considered if the request is for one of a set of namespaces, and the server would keep a cache of matched service accounts for each user, instead of just one, and select the most appropriate one for each request based on namespace and priority. While this would only allow one role (assuming one role per SA) per namespace, that seems like a much friendlier restriction than one SA per cluster, and the mapping of namespace -> SA could be done fairly efficiently. Finally, the server could dynamically generate SA's on a per-session basis based on OIDC claims, and instead collect a set of (cluster)roles to bind to those generated SA's in the same way that SA's are currently selected (but allowing multiple roles to be selected). This might be more permissions than some admins would wish to give to argo, but would allow a clean mapping of SSO roles/groups to k8s roles. |
Dear @alexec, first of all, thanks a lot for that great piece of software you and the community is providing. So I've now given credits to the Ferrari under the hood you've built. With respect to segregation of duties and separate ownership for different Workflows/Workflowtemplates, we need to cover following requirements:
ALL ROUTES LEAD TO ROME. There are multiple ways to incorporate these capabilities into Argo Workflows. Here 2 suggestions: Option 1. Predefine Permission Matrix for UI CapabilitiesAs an example three predefined Permission Groups:
Admin: Can fully use the current UI and its capabilities like Submit new Workflow, Edit Json/Yaml, Upload File, Delete, terminate and so on. Theses UI Permission Groups could be assigned over annotations directly on the Service-accounts. If none is assigned, then read-only could be set as default: Pros:
Option 2. Use similar RBAC Configuration Approach like Argo CD and probably intended in your issue descriptionDefinition of a ConfigMap with additional and fine-grained RBAC roles defined. Also a default role:readonly could be applied. RBAC Permission StructurePermission policy definition should be feasible for all resources or for namespace scoped (see next bullet) Namespace scoped Permission policy definition: RBAC Resources and ActionsIdeally the other Resources like events and sensors should also be covered, because they're also handled in the UI even if the CRD is not in argo-workflows itself: Actions: get= get/list resources Here an example how such a policy could look like:apiVersion: v1
kind: ConfigMap
metadata:
name: argoworkflow-rbac-cm
namespace: <yournamespace>
data:
policy.default: role:readonly
policy.csv: |
p, role:workflow-admin, clusterworkflowtemplates, *, *, allow
p, role:workflow-admin, cronworkflows, *, *, allow
p, role:workflow-admin, eventbus, *, *, allow
p, role:workflow-admin, eventsources, *, *, allow
p, role:workflow-admin, sensors, *, *, allow
p, role:workflow-admin, workfloweventbindings, *, *, allow
p, role:workflow-admin, workflows, *, *, allow
p, role:workflow-admin, workflowtemplates, *, *, allow
p, role:workflow-ops, workflows, get, *, allow
p, role:workflow-ops, workflows, delete, *, allow
p, role:workflow-ops, workflows, submit, *, allow
p, role:workflow-ops, workflows, terminate, *, allow
p, role:workflow-team-blue-scoped, workflows, *, targetnamespace-blue/*, allow
p, role:workflow-team-red-scoped, workflows, *, targetnamespace-red/*, allow
g, your-admin-group, role:workflow-admin
g, your-workflow-ops-group, role:workflow-ops
g, your-team-blue-scoped-group, role:workflow-team-blue-scoped
g, your-team-red-scoped-group, role:workflow-team-red-scoped Pros:
Cons:
The Option 1 is more to be seen as a "quick fix". The Option 2 is definitely preferred and the bulletproof approach (as seen in Argo CD) BTW: One setup which I don't understand completely. Why is there a dedicated namespaced install needed? |
@HouseoLogy I can't don't have much to add to what you said. It is basically correct. Today's solution is intentionally frugal. Basically we have service accounts that have annotations that select a service account for you to use. The upside is we lean on Kubernetes RBAC (less effort, especially on critical security code), the downside is that it does not scale well, ultimately requiring one service account for each user, and each user need to have they're permissions copied into that account. There are other solutions, I've not mentioned: If the problem is operational overhead, we could consider using impersonation. Basically, the users must create a service account (typically within their own namespace.). The We could look at using Kubernetes SSO. I don't know much about this, but it looks like it is not well supported. Finally, we could have the option to use Casbin. I envisage this implemented as a HTTP interceptor, so the resources are the API URLs, not Kubernetes resources. @HouseoLogy this issue is not currently on the core team's roadmap, so won't get looked at > 6 months. But, we do want to do more collaborative features, where someone from the community does the implementation with the guidance from the core team. Would you be interested? |
I like @HouseoLogy 's idea of using ArgoCD's approach, we've use it extensively for more than a year with no complaints. If it isn't already, perhaps that behavior could be extracted into https://github.com/argoproj/pkg for shared, generic use. @alexec Because this is such a blocker for us, I'd be happy to offer some time in implementing one of these solutions. |
Thank you @andrewm-aero . I think we probably need to nail down whether or not "resources" are defined as "API URLs" or as "Kubernetes resources", I think this is a closed-door decision:
But... it may not be what users want. If we use "API URLs" then we may need to change some of those URLs to include namespace (for example). @andrewm-aero have you managed to start the dev set-up? |
Hello @alexec & @andrewm-aero, Like suggested, I would definitely go with the Casbin / Argo CD approach and not implement it as dedicated Kubernetes resources. |
I think I might make a U-turn on saying we should do URLs:
|
I've created a PoC. I do not plan to complete this. Would anyone like to volunteer to take this over the finish line? |
Hey @alexec! I would like to offer some help by taking over your PoC. We would also greatly benefit from this feature at my company 👍 |
@jeanlouhallee I’m assuming you’ve done some Golang before? If so, step 1 is to read the CONTRIBUTING.md guide - clone the source code onto your local machine and checkout the The branch has a number of We’ll want to get some community members to test it too. We can figure out the details closer to the time. |
Thanks for the pointers @alexec. I have done a bit of Golang, but not much. Will learn a lot by diving into this. |
Hi @jeanlouhallee thank you! |
Hello @alexec, do you mind if I ping you directly on Slack for design questions/considerations? |
Sure. That's fine. |
Hi @jeanlouhallee & @alexec, |
We have added new feature related to SSO @HouseoLogy @andrewm-aero Would this help in your use case |
For all those watching, I have created PR #7193 that implements "SSO Impersonation" support in argo-server by using Kubernetes NOTE: these changes apply equally to the aro UI and CLI, as they affect all K8S API calls made by argo-server of behalf of users |
Amazing stuffs! I had a good experience when configuring kiali's authentication. Usually, OIDC is already configured for authentication in Kubernetes Cluster. Kiali to use the same App as Cluster authentication, It will map to the RoleBinding defined in the cluster. In my cases, there are hundreds of users in the cluster, so permissions are managed by You can set up with in kube-apiserver: and Here is RoleBinding examples
If Argo Workflow can follow this flow, it will be very helpful in using it. I think that the method provided by the current Argo Workflow (v3.2) has management limitations, so I plan to operate it in client auth-mode. |
@DingGGu as described in the "limitations" section of my PR #7193, the initial implementation won't support I see two possibilities:
|
@thesuperzapper Looking at your PR comments and the limitations of My current configuration is as follows.
The JWT token for the proper authenticated to Kubernetes and payload is: {
// ...
"email": "[email protected]",
"groups": ["ns-data-readonly", "ns-workflow-writer"],
// ...
} The RoleBinding of the namespace is set as follows. apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: oidc:namespace-readonly
namespace: data
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: oidc:namespace-readonly
subjects:
- kind: Group
name: oidc:ns-data-readonly
apiGroup: rbac.authorization.k8s.io In my use case, the group name of RoleBinding and the group name provided by OIDC are slightly different. Note the In Kiali, even if prefix is set, permission is granted work properly. |
Most users will not be able to reconfigure the Kubernetes API server (e.g. it is EKS, GCP auto-pilot, etc). How do these ideas work for them? |
@alexec This option was supported on EKS and kOps. And I also found GKE was supported too. |
There will be many users in enterprise environments who do not have this option to change, regardless of whether it is available. |
EKS and kOps I'm using have all been applied with in-place upgrade. Why this option is useful is also explained in the EKS documentation. My guess is that Kubernetes was also aware of this problem and provided a prefix option. So, there doesn't seem to be any need to rule out this option. |
The reason why the cluster administrator wants to use "impersonate" is that to use another web console, there is no need to add or change the authorization scheme already configured in Kubernetes. I manage hundreds of developers using dozens of clusters, and if this feature is added correctly, I think the management will be very efficient. Also, other Cluster Administrstor will be happy to change the options of kube-apiserver if they know of these advantages. Usually kube-apiserver consists of 3 Pods, so there is no difficulty to change it. (In my experience it is) |
Since we have our application deployed on AKS under quite strict company control, I would assume that casbin based implementation would be preferable for our use case. Also because we already use casbin for other services, so having unified authorization layer (atleast in terms of tech stack being used) would be nice ;) Hopefully this gets implemented sooner rather than later. |
Fixed by #8120 |
It was not my original goal, but that is what happened. It does not support groups yet. Need to determine that. |
that's amazing anyway!! |
Amazing. How much work is there still to be done? It already looks quite huge feature set. |
Updated version for testing: https://github.com/argoproj/argo-workflows/releases/tag/v0.0.0-dev-mc-8 |
Any plans to continue this development, or is it resolved by something else @alexec? |
It appears to be on the roadmap under |
Personally, I think we should offer an option that uses Kubernetes RBAC roles/groups, like what I was doing in #7193 (which was closed due to inactivity). I am happy to reopen that if the project is still interested in that functionality, I have just been a bit busy moving countries! |
Any solution that doesn't require us to setup a new Okta tile per namespace is okay with me. |
@thesuperzapper SAR would be great to implement as we use this for our services on our cluster and like the ability to maintain control with roles, rolebindings, and SAs. |
@HouseoLogy I'm very interested in the "Operator: Can submit, resubmit, terminate, delete workflows but no creation of custom resources like workflows, workflow templates, sensors and so on. Also no edit possibility of "full workflow options" --> YAML Read-only" permission you mentioned, can you share your role definition for this? I have the Admin and Read-Only roles but am struggling to: allow submitting from existing workflowtemplates but disable submitting arbitrary new workflow yamls (as they have not gone through code review) |
Discussed this in SIG Security last week as internal RBAC requires a decent amount of maintenance overhead and new CRDs, as is the case with Argo CD. Was looking for ideas there in particular as any auth system can become a common attack vector, so relying on existing auth system (like k8s RBAC for authZ) is both simpler and more secure (i.e. "don't reinvent the wheel", "don't invent your own crypto" etc). Some authZ ideas there included making (potentially optional) CRDs for actions (e.g. retry, resubmit) and that way it would use built-in k8s RBAC. Imperative actions as CRDs is a bit of an anti-pattern (k8s is declarative), so that is not ideal either, but it would create or affect declarative resources in the end. Another idea was configurable Lua script actions that CD has (e.g. to restart a Deployment) |
IMHO both Argo CD and Argo Workflows have not done this well.
My proposal is that the Casbin model and policy are stored in a Configmap within the controller namespace and managed using GitOps. They can be mounted as files, very simple. Casbin policies are standard RBAC, i.e. the rules decides whether the user is allowed to perform an action on a resource. So one big task would be determining what they are and making sure the data model is good enough to describe all the potential rules that users would want. Essentially, this would be Argo CD simplified. This proposal does not work if users need to re-configure security at runtime (yuk). |
Yea I'm more or less agreed, that's why I was seeking new ideas and to avoid things CD folks regret in that RBAC model.
This is certainly simpler than CRDs, but that means that people with ConfigMap access will have access to this as well. And folks want namespace-specific config already as well, so we'd need to support multiple ConfigMaps.
That and uh, adding authZ logic everywhere 😅 And since Casbin is not k8s native, either you have to go through the Server or you can bypass it by editing resources directly in k8s. Do you have any thoughts on that specifically @alexec? |
I know it's been a while, but I really think using subject access reviews (at least as one of the options) is probably the simplest solution. It delegates everything to Kubernetes RBAC. Since Argo Server is all about Argo Workflows, and the permission model in workflows is simply "are you allowed to do this action with the CRD". It just makes sense to reflect the same permissions in the user interface by taking the email of the user as the "actor" in the subject access review. The PR that I made a number of years ago could pretty easily be rebased to work with the current HEAD, and it implements exactly this: |
As I wrote above, I was not commenting on authN, and did actually mention past work on that. I have seen and would like to take a look in-depth at your SAR work at some point in the future (also #9325 has some very relevant notes about that approach).
As Alex, I, and many users have stated, k8s RBAC is not granular enough. Several imperative actions of Workflows do not fall neatly into k8s RBAC, and many users would like to be able to add authZ to those. Including retry, resubmit, terminate, allowing full YAML, etc. |
@agilgur5 these "extra" permissions just be represented as virtual "subresources", similar to how Kubernetes already exposes a But either way, we don't need to get bogged down trying to make some crazy solution when simply reflecting the same level of access the user has in It's not like Argo Workflows currently allows more granular access than CREATE/DELETE/LIST on each resource kind (e.g. |
This is not accurate. More granular RBAC is a very popular & common request from users.
This issue does try to address that and allow more granular RBAC.
I was looking into this, but there are some problems with subresources:
|
Reading the comments in more depth here, I think custom CRDs for actions might indeed be the right way to go here, as opposed to an internal RBAC. There are a few examples of this in the ecosystem given there of But the general concept works well: it properly delegates these actions to the Controller (whereas the Argo Server is currently responsible for a few things, c.f. #12538) and makes them all accessible via CRDs: |
I think if you start with the customer persons, list the different use case, create the resource model you want for RBAC, then you’ll be able to tell if Casbin or something else. This issue should probably not be titled “Casbin RBAC”, perhaps it should titled “Fine Grained RBAC” or similar as it put the solution before the problem? |
A technique I've used before is to proxy the k8s api, and implement custom RBAC by using a policy language that translates from an SSO token to a set of fictitious roles/clusterroles that the proxy should act as though the user has. By doing this, you can "invent" whatever custom subresources you want, since its then your own code checking if the request is authorized. Direct k8s access would still be subject to whatever roles are on the actual cluster. |
Per above comments, there are options other than Casbin that align better with the k8s RBAC model, do not require the API to be deployed / can be used without the API, and leave the security up to k8s RBAC instead of an internal RBAC system (best way to reduce surface area to vulnerabilities is to not have it at all). |
Summary
The current model of RBAC leans heavily on Kubernetes RBAC, so is easy to make secure, but may not scale well.
Consider the situation where you have 1000s teams and 1000s namespace. At the very least you may need a OIDC group for each namespace, and then a service account, role and role binding.
It maybe better to use Casbin (or similar) to provide a more flexible way to configure this.
Use Cases
When would you use this?
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
The text was updated successfully, but these errors were encountered: