Skip to content

Commit

Permalink
Merge pull request #199 from sunya-ch/v1.3.0
Browse files Browse the repository at this point in the history
docs: add KubeCon NA demo code
  • Loading branch information
sunya-ch authored Dec 9, 2024
2 parents e241ed0 + 33f9b83 commit 2b98c27
Show file tree
Hide file tree
Showing 4 changed files with 178 additions and 0 deletions.
71 changes: 71 additions & 0 deletions demo/kubecon-na-2024/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Multi-NIC CNI Demo

[![Dressing-up Your Cluster for AI in Minutes with a Portable Network CR - Sunyanan Choochotkaew & Tatsuhiro Chiba, IBM Research](./img/cover.png)](https://youtu.be/Sj2nBKcOWlI?si=63uQ2-RuUHQivzwm)

## System Description
- Cluster: multi-nic-cni
- Pre-installation
- Benchmark operator (CPE)
- Metric server enablement
- MPI operator

```
kubectl create -f mpi-operator.yaml
```

- Grafana with thanos-querier datasource

## Required actions
- Build and replace OSU benchmark image

# Demo Steps
1. Show start state

1.1. Open grafana dashboard

1.2. Login to node

```bash
> ip -br -c link show|grep ens
ens3 UP 02:00:02:56:f5:c5 <BROADCAST,MULTICAST,UP,LOWER_UP>
ens4 UP 02:00:03:57:24:11 <BROADCAST,MULTICAST,UP,LOWER_UP>
ens5 UP 02:00:03:57:24:12 <BROADCAST,MULTICAST,UP,LOWER_UP>
> ip r
default via 10.244.0.1 dev br-ex proto dhcp src 10.244.0.4 metric 48
10.128.0.0/14 via 10.130.2.1 dev ovn-k8s-mp0
10.130.2.0/23 dev ovn-k8s-mp0 proto kernel scope link src 10.130.2.2
10.244.0.0/24 dev br-ex proto kernel scope link src 10.244.0.4 metric 48
10.244.2.0/24 dev ens4 proto kernel scope link src 10.244.2.5 metric 101
10.244.3.0/24 dev ens5 proto kernel scope link src 10.244.3.5 metric 102
169.254.169.0/29 dev br-ex proto kernel scope link src 169.254.169.2
169.254.169.1 dev br-ex src 10.244.0.4
169.254.169.3 via 10.130.2.1 dev ovn-k8s-mp0
172.30.0.0/16 via 169.254.169.4 dev br-ex mtu 1400
```

1.3. HostInterface CR is auto-created.

1.4. No CIDR CR

2. Deploy MultiNicNetwork

3. Show CIDR and node route

```bash
> ip rule
> ip r show table multi-nic-cni-operator-ipvlanl3
```

2. Deploy mpilat.yaml

```bash
oc create -f mpilat.yaml
```

3. Waiting for job complete

```bash
watch oc get benchmark mpilat -o=jsonpath='{.status.jobCompleted}'
```

3. Revisit grafana dashboard for result
Binary file added demo/kubecon-na-2024/img/cover.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
28 changes: 28 additions & 0 deletions demo/kubecon-na-2024/mpi-operator.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
apiVersion: cpe.cogadvisor.io/v1
kind: BenchmarkOperator
metadata:
name: mpi
spec:
apiVersion: kubeflow.org/v1alpha2
kind: MPIJob
adaptor: mpi
crd:
host: https://raw.githubusercontent.com/sunya-ch/mpi-operator/master
paths:
- /deploy/v2beta1/crd.yaml
deploySpec:
namespace: mpi-operator
yaml:
host: https://raw.githubusercontent.com/sunya-ch/mpi-operator/master
paths:
- /deploy/v2beta1/admin_role.yaml
- /deploy/v2beta1/all.yaml
- /deploy/v2beta1/cr.yaml
- /deploy/v2beta1/crb.yaml
- /deploy/v2beta1/crd.yaml
- /deploy/v2beta1/deployment.yaml
- /deploy/v2beta1/edit_role.yaml
- /deploy/v2beta1/mpi-operator.yaml
- /deploy/v2beta1/namespace.yaml
- /deploy/v2beta1/serviceaccount.yaml
- /deploy/v2beta1/view_role.yaml
79 changes: 79 additions & 0 deletions demo/kubecon-na-2024/mpilat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
apiVersion: cpe.cogadvisor.io/v1
kind: Benchmark
metadata:
name: mpilat
namespace: default
spec:
benchmarkOperator:
name: mpi
namespace: default
benchmarkSpec: |
slotsPerWorker: 1
runPolicy:
cleanPodPolicy: Running
mpiReplicaSpecs:
Launcher:
replicas: 1
template:
metadata:
annotations:
k8s.v1.cni.cncf.io/networks: multi-nic-cni-operator-ipvlanl3
spec:
initContainers:
- name: wait-for-workers
image: registry.access.redhat.com/ubi9/ubi:latest
command:
- sleep
- "10"
containers:
- image: osubenchmark:0.3.0-5.6.3
name: mpi-bench-master
imagePullPolicy: Always
securityContext:
privileged: true
command:
- mpirun
- --allow-run-as-root
- --mca
- btl_tcp_if_include
- {{ .net }}
- -np
- "2"
- /osu-micro-benchmarks-5.6.3/mpi/pt2pt/osu_latency
- -m
- "4194304"
Worker:
replicas: 2
template:
metadata:
annotations:
k8s.v1.cni.cncf.io/networks: multi-nic-cni-operator-ipvlanl3
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: training.kubeflow.org/job-name
operator: In
values:
- osu-benchmark-bw
topologyKey: kubernetes.io/hostname
containers:
- image: osubenchmark:0.3.0-5.6.3
name: mpi-bench-worker
imagePullPolicy: Always
securityContext:
privileged: true
repetition: 1
iterationSpec:
sequential: true
minimize: true
iterations:
- name: net
values:
- "eth0"
- "net1-0"
parserKey: osu

0 comments on commit 2b98c27

Please sign in to comment.