-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #199 from sunya-ch/v1.3.0
docs: add KubeCon NA demo code
- Loading branch information
Showing
4 changed files
with
178 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
# Multi-NIC CNI Demo | ||
|
||
[![Dressing-up Your Cluster for AI in Minutes with a Portable Network CR - Sunyanan Choochotkaew & Tatsuhiro Chiba, IBM Research](./img/cover.png)](https://youtu.be/Sj2nBKcOWlI?si=63uQ2-RuUHQivzwm) | ||
|
||
## System Description | ||
- Cluster: multi-nic-cni | ||
- Pre-installation | ||
- Benchmark operator (CPE) | ||
- Metric server enablement | ||
- MPI operator | ||
|
||
``` | ||
kubectl create -f mpi-operator.yaml | ||
``` | ||
|
||
- Grafana with thanos-querier datasource | ||
|
||
## Required actions | ||
- Build and replace OSU benchmark image | ||
|
||
# Demo Steps | ||
1. Show start state | ||
|
||
1.1. Open grafana dashboard | ||
|
||
1.2. Login to node | ||
|
||
```bash | ||
> ip -br -c link show|grep ens | ||
ens3 UP 02:00:02:56:f5:c5 <BROADCAST,MULTICAST,UP,LOWER_UP> | ||
ens4 UP 02:00:03:57:24:11 <BROADCAST,MULTICAST,UP,LOWER_UP> | ||
ens5 UP 02:00:03:57:24:12 <BROADCAST,MULTICAST,UP,LOWER_UP> | ||
> ip r | ||
default via 10.244.0.1 dev br-ex proto dhcp src 10.244.0.4 metric 48 | ||
10.128.0.0/14 via 10.130.2.1 dev ovn-k8s-mp0 | ||
10.130.2.0/23 dev ovn-k8s-mp0 proto kernel scope link src 10.130.2.2 | ||
10.244.0.0/24 dev br-ex proto kernel scope link src 10.244.0.4 metric 48 | ||
10.244.2.0/24 dev ens4 proto kernel scope link src 10.244.2.5 metric 101 | ||
10.244.3.0/24 dev ens5 proto kernel scope link src 10.244.3.5 metric 102 | ||
169.254.169.0/29 dev br-ex proto kernel scope link src 169.254.169.2 | ||
169.254.169.1 dev br-ex src 10.244.0.4 | ||
169.254.169.3 via 10.130.2.1 dev ovn-k8s-mp0 | ||
172.30.0.0/16 via 169.254.169.4 dev br-ex mtu 1400 | ||
``` | ||
|
||
1.3. HostInterface CR is auto-created. | ||
|
||
1.4. No CIDR CR | ||
|
||
2. Deploy MultiNicNetwork | ||
|
||
3. Show CIDR and node route | ||
|
||
```bash | ||
> ip rule | ||
> ip r show table multi-nic-cni-operator-ipvlanl3 | ||
``` | ||
|
||
2. Deploy mpilat.yaml | ||
|
||
```bash | ||
oc create -f mpilat.yaml | ||
``` | ||
|
||
3. Waiting for job complete | ||
|
||
```bash | ||
watch oc get benchmark mpilat -o=jsonpath='{.status.jobCompleted}' | ||
``` | ||
|
||
3. Revisit grafana dashboard for result |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
apiVersion: cpe.cogadvisor.io/v1 | ||
kind: BenchmarkOperator | ||
metadata: | ||
name: mpi | ||
spec: | ||
apiVersion: kubeflow.org/v1alpha2 | ||
kind: MPIJob | ||
adaptor: mpi | ||
crd: | ||
host: https://raw.githubusercontent.com/sunya-ch/mpi-operator/master | ||
paths: | ||
- /deploy/v2beta1/crd.yaml | ||
deploySpec: | ||
namespace: mpi-operator | ||
yaml: | ||
host: https://raw.githubusercontent.com/sunya-ch/mpi-operator/master | ||
paths: | ||
- /deploy/v2beta1/admin_role.yaml | ||
- /deploy/v2beta1/all.yaml | ||
- /deploy/v2beta1/cr.yaml | ||
- /deploy/v2beta1/crb.yaml | ||
- /deploy/v2beta1/crd.yaml | ||
- /deploy/v2beta1/deployment.yaml | ||
- /deploy/v2beta1/edit_role.yaml | ||
- /deploy/v2beta1/mpi-operator.yaml | ||
- /deploy/v2beta1/namespace.yaml | ||
- /deploy/v2beta1/serviceaccount.yaml | ||
- /deploy/v2beta1/view_role.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
apiVersion: cpe.cogadvisor.io/v1 | ||
kind: Benchmark | ||
metadata: | ||
name: mpilat | ||
namespace: default | ||
spec: | ||
benchmarkOperator: | ||
name: mpi | ||
namespace: default | ||
benchmarkSpec: | | ||
slotsPerWorker: 1 | ||
runPolicy: | ||
cleanPodPolicy: Running | ||
mpiReplicaSpecs: | ||
Launcher: | ||
replicas: 1 | ||
template: | ||
metadata: | ||
annotations: | ||
k8s.v1.cni.cncf.io/networks: multi-nic-cni-operator-ipvlanl3 | ||
spec: | ||
initContainers: | ||
- name: wait-for-workers | ||
image: registry.access.redhat.com/ubi9/ubi:latest | ||
command: | ||
- sleep | ||
- "10" | ||
containers: | ||
- image: osubenchmark:0.3.0-5.6.3 | ||
name: mpi-bench-master | ||
imagePullPolicy: Always | ||
securityContext: | ||
privileged: true | ||
command: | ||
- mpirun | ||
- --allow-run-as-root | ||
- --mca | ||
- btl_tcp_if_include | ||
- {{ .net }} | ||
- -np | ||
- "2" | ||
- /osu-micro-benchmarks-5.6.3/mpi/pt2pt/osu_latency | ||
- -m | ||
- "4194304" | ||
Worker: | ||
replicas: 2 | ||
template: | ||
metadata: | ||
annotations: | ||
k8s.v1.cni.cncf.io/networks: multi-nic-cni-operator-ipvlanl3 | ||
spec: | ||
affinity: | ||
podAntiAffinity: | ||
preferredDuringSchedulingIgnoredDuringExecution: | ||
- weight: 100 | ||
podAffinityTerm: | ||
labelSelector: | ||
matchExpressions: | ||
- key: training.kubeflow.org/job-name | ||
operator: In | ||
values: | ||
- osu-benchmark-bw | ||
topologyKey: kubernetes.io/hostname | ||
containers: | ||
- image: osubenchmark:0.3.0-5.6.3 | ||
name: mpi-bench-worker | ||
imagePullPolicy: Always | ||
securityContext: | ||
privileged: true | ||
repetition: 1 | ||
iterationSpec: | ||
sequential: true | ||
minimize: true | ||
iterations: | ||
- name: net | ||
values: | ||
- "eth0" | ||
- "net1-0" | ||
parserKey: osu |