Skip to content

Commit

Permalink
fix proposal with feedbacks
Browse files Browse the repository at this point in the history
Signed-off-by: Zhe Shen <[email protected]>
  • Loading branch information
z1ens committed Aug 30, 2024
1 parent e78c1cb commit fd2a5b3
Show file tree
Hide file tree
Showing 14 changed files with 165 additions and 495 deletions.
90 changes: 50 additions & 40 deletions solutions/kueue-admission-check/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,12 +154,12 @@ metadata:
namespace: kueue-system
spec:
clusterSets:
- spoke
- spoke
tolerations:
- key: cluster.open-cluster-management.io/unreachable
operator: Exists
- key: cluster.open-cluster-management.io/unavailable
operator: Exists
- key: cluster.open-cluster-management.io/unreachable
operator: Exists
- key: cluster.open-cluster-management.io/unavailable
operator: Exists
predicates:
- requiredClusterSelector:
labelSelector:
Expand Down Expand Up @@ -209,16 +209,16 @@ As an admin, I want to leverage OCM's `AddonPlacementScore` for dynamic workload
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
name: placement-sample2
name: placement-demo2
namespace: kueue-system
spec:
clusterSets:
- spoke
- spoke
tolerations:
- key: cluster.open-cluster-management.io/unreachable
operator: Exists
- key: cluster.open-cluster-management.io/unavailable
operator: Exists
- key: cluster.open-cluster-management.io/unreachable
operator: Exists
- key: cluster.open-cluster-management.io/unavailable
operator: Exists
predicates:
- requiredClusterSelector:
labelSelector:
Expand All @@ -232,13 +232,35 @@ spec:
type: AddOn
addOn:
resourceName: resource-usage-score
scoreName: gpuAvailable
scoreName: gpuClusterAvailable
weight: 1
```
- You can manually edit the GPU resources on the managed clusters for testing.
- You can manually edit the GPU resources on the managed clusters for testing, for example on `kind-cluster2`, set 3 fake GPU resources on the `control-plane-node`.
```bash
kubectl edit-status node cluster2-control-plane --context kind-cluster2
kubectl edit-status node cluster3-control-plane --context kind-cluster3
kubectl edit-status node cluster2-control-plane --context kind-cluster2 # Same operation with other clusters/nodes.
```
- Edit the `status` of the node `cluster2-control-plane`:
```yaml
allocatable:
cpu: "8"
ephemeral-storage: 61202244Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
hugepages-32Mi: "0"
hugepages-64Ki: "0"
memory: 8027168Ki
nvidia.com/gpu: "3" # Add 3 fake GPUs in allocatable
pods: "110"
capacity:
cpu: "8"
ephemeral-storage: 61202244Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
hugepages-32Mi: "0"
hugepages-64Ki: "0"
memory: 8027168Ki
nvidia.com/gpu: "3" # Add 3 fake GPUs in capacity
pods: "110"
```
- Apply the changes in the `Placement` to update MultiKueue dynamically.
```bash
Expand Down Expand Up @@ -268,16 +290,20 @@ The OCM Admission Check Controller will integrate OCM `Placement` results into M
Example OCM Admission Check Controller design:

```yaml
# OCM implements an admissioncheck controller to automate the MultiKueue setup process.
# MultiKueueConfigs and MultiKueueClusters are generated dynamically based on OCM placement decisions.
apiVersion: kueue.x-k8s.io/v1beta1
kind: AdmissionCheck
metadata:
name: ocm-multikueue
name: placement-demo2
spec:
controllerName: open-cluster-management.io/placement
parameters:
apiGroup: cluster.open-cluster-management.io
kind: Placement # Placement is under kueue-system namespace.
name: placement-demo2-1
kind: Placement
name: placement-demo2
# Leverages OCM's placement mechanism to select clusters based on specific criteria.
# For example `Placement-demo2-1` selects clusters with the `nvidia-tesla-t4` accelerator label.
```

### Changes in the Configuration Process with OCM Admission Check Controller
Expand Down Expand Up @@ -324,35 +350,19 @@ spec:
With the OCM Admission Check Controller, the need for manual configuration of `MultiKueueConfig` and `MultiKueueCluster` is eliminated. Instead, the administrator only needs to configure two additional admission checks in the ClusterQueue resource:

- `ocm-multikueue`: Automates the process of setting up `MultiKueueConfig` and `MultiKueueCluster`.
- `placement-sample1`: Leverages OCM's placement mechanism to select clusters based on specific criteria. For example `Placement-sample1` selects clusters with the `nvidia-tesla-t4` accelerator label.

```yaml
apiVersion: kueue.x-k8s.io/v1beta1
kind: AdmissionCheck
metadata:
name: ocm-multikueue
spec:
controllerName: kueue.x-k8s.io/multikueue
parameters:
apiGroup: kueue.x-k8s.io
kind: MultiKueueConfig
name: placement
```

Admin configures the above two admission check controllers in the `ClusterQueue`
- Admin configures two admission check controllers in the `ClusterQueue`, for example in `multikueue-setup-demo2`:

```yaml
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "cluster-queue"
name: "cluster-queue-demo2"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory","nvidia.com/gpu"]
flavors:
- name: "default-flavor"
- name: "default-flavor-demo2"
resources:
- name: "cpu"
nominalQuota: 9
Expand All @@ -361,9 +371,9 @@ spec:
- name: "nvidia.com/gpu"
nominalQuota: 3
admissionChecks:
- multikueue
- ocm-multikueue
```
- multikueue-demo2
- placement-demo2
```

#### OCM Admission Check Controller Workflow

Expand Down
Original file line number Diff line number Diff line change
@@ -1,17 +1,10 @@
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: AuthTokenRequest
apiVersion: rbac.open-cluster-management.io/v1alpha1
kind: ClusterPermission
metadata:
name: kueue-cluster2
namespace: kueue-system
name: kueue-admin-cluster1
namespace: cluster1
spec:
targetClusterProfile:
apiGroup: multicluster.x-k8s.io
kind: ClusterProfile
name: cluster2
namespace: open-cluster-management
serviceAccountName: kueue-admin-cluster2
clusterRoles:
- name: kueue-admin-cluster2
clusterRole:
rules:
- apiGroups:
- batch
Expand Down Expand Up @@ -63,4 +56,8 @@ spec:
- get
- patch
- update

clusterRoleBinding:
subject:
kind: ServiceAccount
name: kueue-admin-cluster1
namespace: open-cluster-management-agent-addon
Original file line number Diff line number Diff line change
@@ -1,17 +1,10 @@
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: AuthTokenRequest
apiVersion: rbac.open-cluster-management.io/v1alpha1
kind: ClusterPermission
metadata:
name: kueue-cluster3
namespace: kueue-system
name: kueue-admin-cluster2
namespace: cluster2
spec:
targetClusterProfile:
apiGroup: multicluster.x-k8s.io
kind: ClusterProfile
name: cluster3
namespace: open-cluster-management
serviceAccountName: kueue-admin-cluster3
clusterRoles:
- name: kueue-admin-cluster3
clusterRole:
rules:
- apiGroups:
- batch
Expand Down Expand Up @@ -63,4 +56,8 @@ spec:
- get
- patch
- update

clusterRoleBinding:
subject:
kind: ServiceAccount
name: kueue-admin-cluster2
namespace: open-cluster-management-agent-addon
Original file line number Diff line number Diff line change
@@ -1,17 +1,10 @@
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: AuthTokenRequest
apiVersion: rbac.open-cluster-management.io/v1alpha1
kind: ClusterPermission
metadata:
name: kueue-cred-cluster1
namespace: kueue-system
name: kueue-admin-cluster3
namespace: cluster3
spec:
targetClusterProfile:
apiGroup: multicluster.x-k8s.io
kind: ClusterProfile
name: cluster1
namespace: open-cluster-management
serviceAccountName: kueue-admin-cluster1
clusterRoles:
- name: kueue-admin-cluster1
clusterRole:
rules:
- apiGroups:
- batch
Expand Down Expand Up @@ -63,4 +56,8 @@ spec:
- get
- patch
- update

clusterRoleBinding:
subject:
kind: ServiceAccount
name: kueue-admin-cluster3
namespace: open-cluster-management-agent-addon
28 changes: 0 additions & 28 deletions solutions/kueue-admission-check/env/mg-sa-cma-0.6.0.yaml

This file was deleted.

7 changes: 7 additions & 0 deletions solutions/kueue-admission-check/env/msa-c1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
apiVersion: authentication.open-cluster-management.io/v1beta1
kind: ManagedServiceAccount
metadata:
name: kueue-admin-cluster1
namespace: cluster1
spec:
rotation: {}
7 changes: 7 additions & 0 deletions solutions/kueue-admission-check/env/msa-c2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
apiVersion: authentication.open-cluster-management.io/v1beta1
kind: ManagedServiceAccount
metadata:
name: kueue-admin-cluster2
namespace: cluster2
spec:
rotation: {}
7 changes: 7 additions & 0 deletions solutions/kueue-admission-check/env/msa-c3.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
apiVersion: authentication.open-cluster-management.io/v1beta1
kind: ManagedServiceAccount
metadata:
name: kueue-admin-cluster3
namespace: cluster3
spec:
rotation: {}
Loading

0 comments on commit fd2a5b3

Please sign in to comment.