Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: MPS feature #67

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
537 changes: 537 additions & 0 deletions gpu-operator/gpu-sharing-mps.rst

Large diffs are not rendered by default.

46 changes: 32 additions & 14 deletions gpu-operator/gpu-sharing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,21 +54,35 @@ and not modify nodes with other GPU models.
You can combine the two approaches by applying a cluster-wide default configuration
and then label nodes so that those nodes receive a node-specific configuration.

Comparison: Time-Slicing and Multi-Instance GPU
===============================================
.. _comparison-ts-mps-mig:

The latest generations of NVIDIA GPUs provide an operation mode called
Multi-Instance GPU (MIG). MIG allows you to partition a GPU
into several smaller, predefined instances, each of which looks like a
Comparison: Time-Slicing, Multi-Process Service, and Multi-Instance GPU
=======================================================================

Each of the technologies, time-slicing, Multi-Process Service (MPS), and Multi-Instance GPU (MIG)
enable sharing a physical GPU with more than one workload.

NVIDIA A100 and newer GPUs provide an operation mode called MIG.
MIG enables you to partition a GPU into *slices*.
A slice is a smaller, predefined GPU instance that looks like a
mini-GPU that provides memory and fault isolation at the hardware layer.
You can share access to a GPU by running workloads on one of
these predefined instances instead of the full native GPU.

MIG support was added to Kubernetes in 2020. Refer to `Supporting MIG in Kubernetes <https://www.google.com/url?q=https://docs.google.com/document/d/1mdgMQ8g7WmaI_XVVRrCvHPFPOMCm5LQD5JefgAh6N8g/edit&sa=D&source=editors&ust=1655578433019961&usg=AOvVaw1F-OezvM-Svwr1lLsdQmu3>`_
for details on how this works.

Time-slicing trades the memory and fault-isolation that is provided by MIG
for the ability to share a GPU by a larger number of users.
NVIDIA V100 and newer GPUs support MPS.
MPS enables dividing a physical GPU into *replicas* and assigning workloads to a replica.
While MIG provides fault isolation in hardware, MPS uses software to divide the GPU into replicas.
Each replica receives an equal portion of memory and thread percentage.
For example, if you configure two replicas, each replica has access to 50% of GPU memory and 50% of compute capacity.

Time-slicing is available with all GPUs supported by the Operator.
Unlike MIG, time-slicing has no special memory or fault-isolation.
Like MPS, time-slicing uses the term *replica*, however, the GPU is not divided between workloads.
The GPU performs a context switch and swaps resources on and off the GPU when a workload is scheduled.

Time-slicing also provides a way to provide shared access to a GPU for
older generation GPUs that do not support MIG.
However, you can combine MIG and time-slicing to provide shared access to
Expand Down Expand Up @@ -234,15 +248,15 @@ The following table describes the key fields in the config map.
Applying One Cluster-Wide Configuration
=======================================

Perform the following steps to configure GPU time-slicing if you already installed the GPU operator
Perform the following steps to configure GPU time-slicing if you already installed the GPU Operator
and want to apply the same time-slicing configuration on all nodes in the cluster.

#. Create a file, such as ``time-slicing-config-all.yaml``, with contents like the following example:

.. literalinclude:: ./manifests/input/time-slicing-config-all.yaml
:language: yaml

#. Add the config map to the same namespace as the GPU operator:
#. Add the config map to the same namespace as the GPU Operator:

.. code-block:: console

Expand Down Expand Up @@ -284,7 +298,7 @@ control which configuration is applied to which nodes.
.. literalinclude:: ./manifests/input/time-slicing-config-fine.yaml
:language: yaml

#. Add the config map to the same namespace as the GPU operator:
#. Add the config map to the same namespace as the GPU Operator:

.. code-block:: console

Expand Down Expand Up @@ -339,9 +353,9 @@ Configuring Time-Slicing Before Installing the NVIDIA GPU Operator
You can enable time-slicing with the NVIDIA GPU Operator by passing the
``devicePlugin.config.name=<config-map-name>`` parameter during installation.

Perform the following steps to configure time-slicing before installing the operator:
Perform the following steps to configure time-slicing before installing the Operator:

#. Create the namespace for the operator:
#. Create the namespace for the Operator:

.. code-block:: console

Expand Down Expand Up @@ -418,15 +432,17 @@ Perform the following steps to verify that the time-slicing configuration is app
* The ``nvidia.com/gpu.count`` label reports the number of physical GPUs in the machine.
* The ``nvidia.com/gpu.product`` label includes a ``-SHARED`` suffix to the product name.
* The ``nvidia.com/gpu.replicas`` label matches the reported capacity.
* The ``nvidia.com/gpu.sharing-strategy`` label is set to ``time-slicing``.

.. code-block:: output
:emphasize-lines: 3,4,5,7
:emphasize-lines: 3-6,8

...
Labels:
nvidia.com/gpu.count=4
nvidia.com/gpu.product=Tesla-T4-SHARED
nvidia.com/gpu.replicas=4
nvidia.com/gpu.sharing-strategy=time-slicing
Capacity:
nvidia.com/gpu: 16
...
Expand All @@ -441,15 +457,17 @@ Perform the following steps to verify that the time-slicing configuration is app
* The ``nvidia.com/gpu`` capacity reports ``0``.
* The ``nvidia.com/gpu.shared`` capacity equals the number of physical GPUs multiplied by the
specified number of GPU replicas to create.
* The ``nvidia.com/gpu.sharing-strategy`` label is set to ``time-slicing``.

.. code-block:: output
:emphasize-lines: 3,7,8
:emphasize-lines: 3,8,9

...
Labels:
nvidia.com/gpu.count=4
nvidia.com/gpu.product=Tesla-T4
nvidia.com/gpu.replicas=4
nvidia.com/gpu.sharing-strategy=time-slicing
Capacity:
nvidia.com/gpu: 0
nvidia.com/gpu.shared: 16
Expand Down
1 change: 1 addition & 0 deletions gpu-operator/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
:hidden:

Multi-Instance GPU <gpu-operator-mig.rst>
MPS GPU Sharing <gpu-sharing-mps.rst>
Time-Slicing GPUs <gpu-sharing.rst>
gpu-operator-rdma.rst
Outdated Kernels <install-gpu-operator-outdated-kernels.rst>
Expand Down
12 changes: 12 additions & 0 deletions gpu-operator/manifests/input/mps-config-all.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: mps-config-all
data:
mps-any: |-
version: v1
sharing:
mps:
resources:
- name: nvidia.com/gpu
replicas: 4
22 changes: 22 additions & 0 deletions gpu-operator/manifests/input/mps-config-fine.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: mps-config-fine
data:
mps-four: |-
version: v1
sharing:
mps:
renameByDefault: false
resources:
- name: nvidia.com/gpu
replicas: 4
mps-two: |-
version: v1
sharing:
mps:
renameByDefault: false
resources:
- name: nvidia.com/gpu
replicas: 2

32 changes: 32 additions & 0 deletions gpu-operator/manifests/input/mps-verification.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: mps-verification
labels:
app: mps-verification
spec:
replicas: 5
selector:
matchLabels:
app: mps-verification
template:
metadata:
labels:
app: mps-verification
spec:
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
hostPID: true
containers:
- name: cuda-sample-vector-add
image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04"
command: ["/bin/bash", "-c", "--"]
args:
- while true; do /cuda-samples/vectorAdd; done
resources:
limits:
nvidia.com/gpu: 1
nodeSelector:
nvidia.com/gpu.sharing-strategy: mps
2 changes: 2 additions & 0 deletions gpu-operator/manifests/input/time-slicing-verification.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,5 @@ spec:
resources:
limits:
nvidia.com/gpu: 1
nodeSelector:
nvidia.com/gpu.sharing-strategy: time-slicing
11 changes: 11 additions & 0 deletions gpu-operator/manifests/output/mps-all-get-events.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
LAST SEEN TYPE REASON OBJECT MESSAGE
38s Normal SuccessfulDelete daemonset/nvidia-device-plugin-daemonset Deleted pod: nvidia-device-plugin-daemonset-l86fw
38s Normal SuccessfulDelete daemonset/gpu-feature-discovery Deleted pod: gpu-feature-discovery-shj2m
38s Normal Killing pod/gpu-feature-discovery-shj2m Stopping container gpu-feature-discovery
38s Normal Killing pod/nvidia-device-plugin-daemonset-l86fw Stopping container nvidia-device-plugin
37s Normal Scheduled pod/nvidia-device-plugin-daemonset-lcklx Successfully assigned gpu-operator/nvidia-device-plugin-daemonset-lcklx to worker-1
37s Normal SuccessfulCreate daemonset/gpu-feature-discovery Created pod: gpu-feature-discovery-pgx9l
37s Normal Scheduled pod/gpu-feature-discovery-pgx9l Successfully assigned gpu-operator/gpu-feature-discovery-pgx9l to worker-0
37s Normal SuccessfulCreate daemonset/nvidia-device-plugin-daemonset Created pod: nvidia-device-plugin-daemonset-lcklx
36s Normal Created pod/nvidia-device-plugin-daemonset-lcklx Created container config-manager-init
36s Normal Pulled pod/nvidia-device-plugin-daemonset-lcklx Container image "nvcr.io/nvidia/cloud-native/gpu-operator-validator:v24.3.0" already present on machine
6 changes: 6 additions & 0 deletions gpu-operator/manifests/output/mps-get-pods.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
NAME READY STATUS RESTARTS AGE
mps-verification-86c99b5666-hczcn 1/1 Running 0 3s
mps-verification-86c99b5666-sj8z5 1/1 Running 0 3s
mps-verification-86c99b5666-tnjwx 1/1 Running 0 3s
mps-verification-86c99b5666-82hxj 1/1 Running 0 3s
mps-verification-86c99b5666-9lhh6 1/1 Running 0 3s
13 changes: 13 additions & 0 deletions gpu-operator/manifests/output/mps-logs-pods.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Found 5 pods, using pod/mps-verification-86c99b5666-tnjwx
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
...