Skip to content

Commit

Permalink
docs: add clusterResourcePlacement and components doc (#598)
Browse files Browse the repository at this point in the history
  • Loading branch information
zhiying-lin authored Nov 9, 2023
1 parent 67d8a34 commit d6474f7
Show file tree
Hide file tree
Showing 5 changed files with 254 additions and 0 deletions.
235 changes: 235 additions & 0 deletions docs/concepts/ClusterResourcePlacement/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
# ClusterResourcePlacement

## Overview

`ClusterResourcePlacement` concept is used to dynamically select cluster scoped resources (especially namespaces and all
objects within it) and control how they are propagated to all or a subset of the member clusters.
A `ClusterResourcePlacement` mainly consists of three parts:
- **Resource selection**: select which cluster-scoped Kubernetes
resource objects need to be propagated from the hub cluster to selected member clusters.

It supports the following forms of resource selection:
- Select resources by specifying just the <group, version, kind>. This selection propagates all resources with matching <group, version, kind>.
- Select resources by specifying the <group, version, kind> and name. This selection propagates only one resource that matches the <group, version, kind> and name.
- Select resources by specifying the <group, version, kind> and a set of labels using ClusterResourcePlacement -> LabelSelector.
This selection propagates all resources that match the <group, version, kind> and label specified.

**Note:** When a namespace is selected, all the namespace-scoped objects under this namespace are propagated to the
selected member clusters along with this namespace.

- **Placement policy**: limit propagation of selected resources to a specific subset of member clusters.
The following types of target cluster selection are supported:
- **PickAll (Default)**: select any member clusters with matching cluster `Affinity` scheduling rules. If the `Affinity`
is not specified, it will select all joined and healthy member clusters.
- **PickFixed**: select a fixed list of member clusters defined in the `ClusterNames`.
- **PickN**: select a `NumberOfClusters` of member clusters with optional matching cluster `Affinity` scheduling rules or topology spread constraints `TopologySpreadConstraints`.

- **Rollout strategy**: how to propagate new changes to the selected member clusters.

A simple `ClusterResourcePlacement` looks like this:

```yaml
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: crp-1
spec:
policy:
placementType: PickN
numberOfClusters: 2
topologySpreadConstraints:
- maxSkew: 1
topologyKey: "env"
whenUnsatisfiable: DoNotSchedule
resourceSelectors:
- group: ""
kind: Namespace
name: test-deployment
version: v1
revisionHistoryLimit: 100
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
unavailablePeriodSeconds: 5
type: RollingUpdate
```
## When To Use `ClusterResourcePlacement`

`ClusterResourcePlacement` is useful when you want for a general way of managing and running workloads across multiple clusters.
Some example scenarios include the following:
- As a platform operator, I want to place my cluster-scoped resources (especially namespaces and all objects within it)
to a cluster that resides in the us-east-1.
- As a platform operator, I want to spread my cluster-scoped resources (especially namespaces and all objects within it)
evenly across the different regions/zones.
- As a platform operator, I prefer to place my test resources into the staging AKS cluster.
- As a platform operator, I would like to separate the workloads for compliance or policy reasons.
- As a developer, I want to run my cluster-scoped resources (especially namespaces and all objects within it) on 3 clusters.
In addition, each time I update my workloads, the updates take place with zero downtime by rolling out to these three clusters incrementally.

## Placement Workflow

![](placement-concept-overview.jpg)

The placement workflow will be divided into several stages:
* Creating `ClusterSchedulingPolicySnapshot` and `ClusterResourceSnapshot` snapshots: clusterResourcePlacement controller
captures the selected resources and placement policy specified in the `ClusterResourcePlacement`.
* Creating `ClusterResourceBinding`: multi-cluster scheduler creates the `clusterResourceBinding` based on the scheduling
decisions for each target cluster and rollout controller binds selected resources according to the rollout strategy.
* Creating `Work` on the member cluster reserved namespace of the hub cluster: work generator creates the work on the
corresponding member cluster namespace. Each work contains the manifest workload to be deployed on the member clusters.
* Applying `Work` on the member clusters: apply work controller applies the manifest workload on the member clusters.

## Resource Selection

Resource selectors identify cluster-scoped objects to include based on standard Kubernetes identifiers - namely, the `group`,
`kind`, `version`, and `name` of the object. Namespace-scoped objects are included automatically when the namespace they
are part of is selected. The example `ClusterResourcePlacement` above would include the `test-deployment` namespace and
any objects that were created in that namespace.

The clusterResourcePlacement controller creates the `ClusterResourceSnapshot` to store a snapshot of selected resources
selected by the placement. The `ClusterResourceSnapshot` spec is immutable. Each time when the selected resources are updated,
the clusterResourcePlacement controller will detect the resource changes and create a new `ClusterResourceSnapshot`. It implies
that resources can change independently of any modifications to the `ClusterResourceSnapshot`. In other words, resource
changes can occur without directly affecting the `ClusterResourceSnapshot` itself.

The total amount of selected resources may exceed the 1MB limit for a single Kubernetes object. As a result, the controller
may produce more than one `ClusterResourceSnapshot`s for all the selected resources.

## Placement Policy

`ClusterResourcePlacement` supports three types of policy as mentioned above. `ClusterSchedulingPolicySnapshot` will be
generated whenever policy changes are made to the `ClusterResourcePlacement` that require a new scheduling. Similar to
`ClusterResourceSnapshot`, its spec is immutable.

![](scheduling.jpg)

Compared with the Kubernetes' original scheduler framework, the multi-cluster scheduling selects a cluster for the placement
in a 5-step operations:
1. Batch & PostBatch
2. Filter
3. Score
4. Sort
5. Bind

The _batch & postBatch_ step is to define the batch size according to the desired and current `ClusterResourceBinding`.
The postBatch is to adjust the batch size if needed.

The _filter_ step finds the set of clusters where it's feasible to schedule the placement, for example, whether the cluster
is matching required `Affinity` scheduling rules specified in the `Policy`. It also filters out any clusters which are
leaving the fleet or no longer connected to the fleet, for example, its heartbeat has been stopped for a prolonged period of time.

In the _score_ step (only applied to the pickN type), the scheduler assigns a score to each cluster that survived filtering.
Each cluster is given a topology spread score (how much a cluster would satisfy the topology spread
constraints specified by the user), and an affinity score (how much a cluster would satisfy the preferred affinity terms
specified by the user).

In the _sort_ step (only applied to the pickN type), it sorts all eligible clusters by their scores, sorting first by topology
spread score and breaking ties based on the affinity score.

The _bind_ step is to create/update/delete the `ClusterResourceBinding` based on the desired and current member cluster list.

## Rollout Strategy
Update strategy determines how changes to the `ClusterWorkloadPlacement` will be rolled out across member clusters.
The only supported update strategy is `RollingUpdate` and it replaces the old placed resource using rolling update, i.e.
gradually create the new one while replace the old ones.

## Placement status

After a `ClusterResourcePlacement` is created, details on current status can be seen by performing a `kubectl describe crp <name>`.
The status output will indicate both placement conditions and individual placement statuses on each member cluster that was selected.
The list of resources that are selected for placement will also be included in the describe output.

Sample output:

```yaml
Name: crp-1
Namespace:
Labels: <none>
Annotations: <none>
API Version: placement.kubernetes-fleet.io/v1beta1
Kind: ClusterResourcePlacement
Metadata:
...
Spec:
Policy:
Number Of Clusters: 2
Placement Type: PickN
Resource Selectors:
Group:
Kind: Namespace
Name: test
Version: v1
Revision History Limit: 10
Status:
Conditions:
Last Transition Time: 2023-11-06T10:22:56Z
Message: found all the clusters needed as specified by the scheduling policy
Observed Generation: 2
Reason: SchedulingPolicyFulfilled
Status: True
Type: ClusterResourcePlacementScheduled
Last Transition Time: 2023-11-06T10:22:56Z
Message: All 2 cluster(s) are synchronized to the latest resources on the hub cluster
Observed Generation: 2
Reason: SynchronizeSucceeded
Status: True
Type: ClusterResourcePlacementSynchronized
Last Transition Time: 2023-11-06T10:22:56Z
Message: Successfully applied resources to 2 member clusters
Observed Generation: 2
Reason: ApplySucceeded
Status: True
Type: ClusterResourcePlacementApplied
Placement Statuses:
Cluster Name: aks-member-1
Conditions:
Last Transition Time: 2023-11-06T10:22:56Z
Message: Successfully scheduled resources for placement in aks-member-1 (affinity score: 0, topology spread score: 0): picked by scheduling policy
Observed Generation: 2
Reason: ScheduleSucceeded
Status: True
Type: ResourceScheduled
Last Transition Time: 2023-11-06T10:22:56Z
Message: Successfully Synchronized work(s) for placement
Observed Generation: 2
Reason: WorkSynchronizeSucceeded
Status: True
Type: WorkSynchronized
Last Transition Time: 2023-11-06T10:22:56Z
Message: Successfully applied resources
Observed Generation: 2
Reason: ApplySucceeded
Status: True
Type: ResourceApplied
Cluster Name: aks-member-2
Conditions:
Last Transition Time: 2023-11-06T10:22:56Z
Message: Successfully scheduled resources for placement in aks-member-2 (affinity score: 0, topology spread score: 0): picked by scheduling policy
Observed Generation: 2
Reason: ScheduleSucceeded
Status: True
Type: ResourceScheduled
Last Transition Time: 2023-11-06T10:22:56Z
Message: Successfully Synchronized work(s) for placement
Observed Generation: 2
Reason: WorkSynchronizeSucceeded
Status: True
Type: WorkSynchronized
Last Transition Time: 2023-11-06T10:22:56Z
Message: Successfully applied resources
Observed Generation: 2
Reason: ApplySucceeded
Status: True
Type: ResourceApplied
Selected Resources:
Kind: Namespace
Name: test
Version: v1
Kind: ConfigMap
Name: test-1
Namespace: test
Version: v1
Events: <none>
```
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 19 additions & 0 deletions docs/concepts/Components/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Components

This document provides an overview of the components required for a fully functional and operational Fleet setup.

![](architecture.jpg)

The fleet consists of the following components:
- fleet-hub-agent is a Kubernetes controller that create and reconciles all the fleet related CRs in the hub cluster.
- fleet-member-agent is a Kubernetes controller that create and reconciles all the fleet related CRs in the member cluster.
The fleet-member-agent is pulling the latest CRs from the hub cluster and consistently reconciles the member clusters to
the desired state.

The fleet implements agent-based pull mode. So that the working pressure can be distributed to the member clusters, and it
helps to breach the bottleneck of scalability, by dividing the load into each member cluster. On the other hand, hub
cluster does not need to directly access to the member clusters. Fleet can support the member clusters which only have
the outbound network and no inbound network access.

To allow multiple clusters to run securely, fleet will create a reserved namespace to isolate the access permissions and
resources across multiple clusters.
Binary file added docs/concepts/Components/architecture.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit d6474f7

Please sign in to comment.