diff --git a/design-proposals/policy-attributes.md b/design-proposals/policy-attributes.md new file mode 100644 index 00000000..a03dcb59 --- /dev/null +++ b/design-proposals/policy-attributes.md @@ -0,0 +1,303 @@ +# Design Proposal: Policy Attributes + +**Authors**: @zivnevo, @elevran + +**Begin Design Discussion**: 2023-11-20 + +**Status:** draft + +## Summary/Abstract + +ClusterLink policies apply to communications between clients and services. + Both types of workloads can be identified by a strong (e.g., cryptographic) identity. + The identity links the workload to a set of attributes, and policies are + defined on workload attributes. This design proposal defines the initial + set of attributes used in policies. + +## Background + +### Motivation and problem space + +ClusterLink exchanges workload attributes when determining policies governing communications. + Policies affect different communication aspects including, for example, access control and + load balancing. ClusterLink gateways serve as enforcement points for egress and ingress traffic. + The set of attributes is ill defined. We would like to define the set of attributes used + in exchange and policies. + +The set of attributes applicable to a communication flow is either determined by the control + plane at runtime or derived from the workload's identity document. We can associate two measures + with each attribute: + +- **trustworthiness**, relating the level of trust we can place in its derivation (e.g., permission-level, + complexity and skill required in affecting the attribute's value). Ideally, policies make + judicious use of attributes based on the level of trust and sensitivity of the communicating workloads. +- **usefulness**, relating to the amount of unique context provided by the attribute. + For example, attributes that set the workload's application tier are far more useful than + arbitrarily-set attributes such as process id or creation timestamp. + +### Impact and desired outcome + +The current set of policy attributes is incomplete and not well defined. + This leaves the implementation to make decisions that are not fully transparent to users. + Defining the (initial) set of attributes used in policies, would allow ClusterLink users to + make informed and stable decisions about policy definition as suited for their use case and + requirements. + +### Prior discussion and links + +Not applicable. + +## User/User Story + +- **Access control based on cluster geography**: As a network administrator I would like to enable + Service access only from certain locations (e.g., EU only, to comply with GDPR). +- **Load balancing based on cluster geography**: As a network administrator I would like to set + a policy that, when a Service is provided by several remote locations, only locations with the + same geography as the source should be considered. +- **Access control based on cluster identity**: As a Service owner, I would like to allow + access to a specific service only from other clusters I own. +- **Access control based on workload namespace and labels**: as a Service owner, I would + like to enable access to a service based on the source workload namespace and its "role" label + value, regardless of cluster where the workload is running (e.g., assumes clusters are used as the + infrastructure and teams are allocated the same namespace across all clusters). Consequently, I would like + to enforce an egress policy that allows only workloads from *namespaces I own* on remote clusters and + not from namespaces assigned to other users. The labels of workloads running in other namespaces are + not trusted. +- **Access control based on workload verified/validated version**: As a Service owner, I would like to allow access to a specific service only for validated and trusted workloads. This might be an extension or working in conjunction with any or all of the above use-cases. Only specific version of the image (validated by image tag or even SHA) is allowed to access this service. + +## Goals + +This design document should: + +- Define the (initial) set of attributes available for policy definition and enforcement. The set + may be extended in the future. +- Define the source of each attribute (i.e., where retrieved), along with some assessment of its + trustworthiness and usefulness. +- Define how attributes are encoded in policy definition and exchanged in gateway communications + to enable policy enforcement. + +## Non-Goals + +The following aspects are explicitly excluded and are out of scope in the current design: + +- Defining policy attributes and their facets in environments other than Kubernetes. +- Define the life-cycle management of the attribute set (i.e., how attributes are added, deprecated + and modified in a backward compatible manner). +- Define the process of formal and provable attestation of attributes and their values. This topic + is partially addressed by assigning different trustworthiness measures to different attributes. + +## Proposal + +Every connection has a source and a destination. While the source is a specific workload instance + (e.g., a Pod), the destination is a Service (i.e., collection of instances). Kubernetes does not + have an equivalent grouping concept for "clients" as it does for "servers", thus, we assign + and process attributes at the workload (specific client instance) and the Service (a collection + of potential destinations - actual instance selection is left to Kubernetes mechanism and out of + scope). Note that, in the future, we may leverage Kubernetes constructs such as `ReplicaSet` or + `Deployment` as a convenience grouping mechanism, though this will not replace the attribute + set defined herein. + +We propose to have attributes defined at different scope/layer, with each object implicitly assigned + attributes of containing layer: + +- Site level attributes (either a fixed set defined by ClusterLink or extended by user according to + the Fabric configuration [1]) that are pertinent to all workloads and Services in the site. Examples + may include `geography`, `cloud-provider`, `cloud-region`, `cluster-name`, etc. +- Service level attributes (either a fixed set or augmented by user in the fabric configuration). + These may includes such attributes as `service-name`, `namespace`, `labels`, etc. Other attributes may + be derived from the Kubernetes Service definition, if relevant. Services are assigned the Site + attributes as well. +- Workload attributes are associated with a specific workload instance, and may include, for example, + `service-account`, `namespace`, `image-name`, etc. Workloads are assigned the Site attributes as well. + +[1] Fabric level configuration could be used to define the set of attributes that can be defined per Site. + The concept of a fabric defines a "container" for sites that can potentially communicate with each other. + The fabric defines the root of trust as well as any global configuration. + +### General Properties of Attributes + +- All attributes are key-value pairs. Keys are unique within a set (i.e., can't appear more than once). +- Attributes are scoped. Scope is set in the key prefix (e.g., "cl-site:geo", "k8s:ns:, not "geo","ns"). + This potentially enables future extension to other environments without having to overload concepts. +- Attributes are not typed - the value in the key-value pair is always a string. This enables the use + of match expressions (e.g., *is*, *is not*, *is one of*, etc.). +- Attribute trustworthiness varies. The user / policy writer is ultimately responsible for deciding + what attributes are relevant in a policy. + +### Workload Attributes + +If we assume the following are true: + +- Replies from Kubernetes API server can be trusted; +- authentication/authorization is correctly configured on the Kubernetes API server; and +- users are isolated in their own namespaces + +then the following attributes can be used to identify a workload (K8s Pod) within a Site: + +- K8s namespace +- Other metadata fields, including + - Pod labels + - Pod name + - Owner reference +- Pod Spec fields, including + - Service Account + - Image name and SHA/tag (if multiple: concatenate, sort and base64 encode) + - Init image name and SHA/tag (if multiple: concatenate, sort and base64 encode) + +As users are isolated in their own namespaces, it is not possible for an attacker to provision + resources in arbitrary namespaces and impersonate another workload. Labels, then, can be used to + differentiate between the different workloads within the namespace. Assuming they are configured + correctly by the workload owner, this should be sufficient to uniquely specify workloads safely. + However, functional attributes, such as image name or its Service Account, might be handy as well. + +### Service Attributes + +Service attributes are set (or retrieved) when a Service is exported. Remote gateways become aware + of the Service attributes when a service is first imported. If multiple bindings exist for an Import, + all bound Services must have fully matching attribute set. A binding is declined when there is a + mismatch between a first and later binding. Ideally, the management layer will ensure all gateways + importing the same service, will see an identical set of attributes. This also favors that Services + and Service attributes are set by the user in a central place and get distributed via management layer. + The exact definition is out of scope of this design. + +### Gateway Attributes + +Gateways learn the attributes associated with other gateways when Peers are added. + +### Attribute Table + +| Attribute name | Scope | Source | Description | Comments | +| ---- | ----- | ------ | ----------- | -------- | +| `cl:fabric` | Site/Fabric | configuration | fabric the site belongs to | Implicit via CA, might be useful in future for cross fabric communication | +| `site:name` | Site | configuration | site name | Configured when site is created | +| `site:location` | Site | configuration | site location | hierarchical (e.g., `aws/us-east/vpc17`) or split to flat attributes (e.g., `site:provider`, `site:region` - similar to `site:name`) | +| `site:environment` | Site | configuration | site environment (e.g., production, staging) | mandated or recommended? | +| `cl:site:` | Site | configuration | user defined site attributes | do we want to support these initially? | +| `cl:service:` | Service | configuration | user defined Service attributes | do we want to support these initially? | +| `service:name` | Service | k8s API (or Export/Import?) | Service name | is there a corresponding workload name? For workloads name are randomized, but the name of the "owner object" might be useful? Is there a service namespace?| +| `k8s:ns` | Workload, Service | k8s API | Kubernetes namespace | | +| `k8s:sa` | Workload | k8s API | Kubernetes service account name | | +| `k8s:label:` | Workload, Service | k8s API | Kubernetes label(s) | the use of [common k8s labels](https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/) is recommended. Labels describing the application structure (e.g., `app`, `role`, `tier`) could be expressive and flexible | +| `k8s:container-image` | Workload | k8s API | Image name | includes repo, name, SHA/Tag. Multiple images are concatenated. | +| `k8s:init-container-image` | Workload | k8s API | Image name | includes repo, name, SHA/Tag. Multiple images are concatenated. | + +### Exchanging Attributes Between Gateways + +For simplicity, let's assume that all gateways keep the attributes of all other gateways. +Moreover, all gateways keep the attributes of all imported/exported services. +While this assumption allows optimizing bandwidth and simplifies the below description, it is not +strictly required. For this assumption to hold, management layer will probably need a mechanism +to allow updating the attributes of gateways/services across the fabric. + +**Client Side:** + +1. The local gateway data plane gets a request from a local workload (client) to connect to a + remote service. The client handle (currently its IP address) and destination service are passed + to the control plane. +1. The control plane extracts workload attributes from the cluster's API server. The client's IP address + is used as a handle to identify the workload. +1. The control plane merges these attributes with its own (gateway attributes) to form the set of + source attributes. +1. The control plane forms a collection of destination attribute sets, one set per remote-service binding. + Each set of destination attributes contains both the attributes of the remote service and the attributes + of the remote gateway exposing this service. +1. The control plane can now call the policy engine component with the set of source attributes + and with the collection of sets of destination attributes. +1. The access-policy engine will filter down to the set of remote gateways that are allowed to provide the + service (if any) based on access control policies set. The load-balancing-policy engine will choose one + remote gateway out of this set based on the load balancing policies defined. +1. The selected destination will be returned to the data plane (potentially along with other configuration + if needed), which can then initiate a connection request to the remote gateway. + +**Server Side:** + +1. The gateway on the cluster of the exported service gets a connection request from the client-side + gateway. The connection request includes the attributes of the requesting workload. +1. The server-side gateway merges these attributes with the attributes of the client-side gateway + to form the set of source attributes (note that the source site attributes are not sent to conserve + resources - see note [here](#exchanging-attributes-between-gateways)). +1. The server-side gateway then merges the attributes of the requested service with its own set of gateway + attributes to form the set of destination attributes. +1. It can now call the policy engine with the two sets of attributes and get an allow/deny answer. + +## Impacts / Key Questions + + + + + +- How safe is relying on the requesting workload's IP to obtain its attributes from the K8s API server? + Which attacks does this expose us to? Can workloads contain their own identity instead (possibly via a sidecar) and present these tokens/certs to the gateway in order to get access to specific resource? +- What is the process of establishing the Gateway attributes? Are these attributes encoded in the Gateway certificate? + +## Future Milestones + +The design will enable the following which are out of scope for now: + +- Support for additional attribute sources in the future +- Additional for additional attributes +- Adding and enforcing the setting of user defined attributes for services and sites + +## Non Functional + +### Testing Plan + +TODO + +### Update/Rollback Compatibility + +We don't support backward compatibility. All policies and implementations must be updated to the + adhere to the specification defined by this design. + +### Scalability + +TODO: not applicable. + +### Security Considerations + +The introduction of ClusterLink gateways to a cluster, increases the 'surface area' exposed + for attack, by allowing remote access to Services. + +The following security considerations are impacted (though not necessarily directly by this design + change which is more concerned with formalizing existing implementation): + +- ClusterLink gateways are configured to establish mutually authenticated connections only with + other gateways in the same Fabric (trust domain, certificate authority). This should limit + some of the exposure. +- ClusterLink requires elevated permissions to read Pod and Service specification and status + across multiple namespaces. +- The "trustworthiness" of attributes is paramount for effective policy enforcement, in articular for + access control. The trust depends on (1) secure access to the API masters; and (2) effective + segregation and confinement of users to specific namespaces. Both of these assumptions are + reasonable and expected under normal cluster operation and management. +- Similarly, the correctness of the policy engine impacts the operation and cross site + communication. +- Users may opt-out of ClusterLink access by (1) not importing/exporting Services; (2) ensuring + strict, default deny, policies are defined; and (3) potentially further locking down access by + setting appropriate k8s NetworkPolicies on their sensitive Pods, disallowing access from the + clusterLink namespace. +- The use of client IP address as the client handle used in retrieving attributes can be subject + to impersonation/spoofing in certain cases. + +### Implementation Phases/History + + + +TODO + +- gateway attr (encoded in cert?) +- workload attr, collected by control plane +- service attr, defined by user, carried over on import