From b46f15cda58ccb07455d4d0870a1b0e898ccba6f Mon Sep 17 00:00:00 2001 From: Mike McKiernan Date: Tue, 11 Jun 2024 08:00:19 -0400 Subject: [PATCH 1/3] MPS feature Signed-off-by: Mike McKiernan --- gpu-operator/gpu-sharing-mps.rst | 512 ++++++++++++++++++ gpu-operator/gpu-sharing.rst | 46 +- gpu-operator/index.rst | 1 + .../manifests/input/mps-config-all.yaml | 12 + .../manifests/input/mps-config-fine.yaml | 22 + .../manifests/input/mps-verification.yaml | 32 ++ .../input/time-slicing-verification.yaml | 2 + .../manifests/output/mps-all-get-events.txt | 11 + .../manifests/output/mps-get-pods.txt | 6 + .../manifests/output/mps-logs-pods.txt | 13 + 10 files changed, 643 insertions(+), 14 deletions(-) create mode 100644 gpu-operator/gpu-sharing-mps.rst create mode 100644 gpu-operator/manifests/input/mps-config-all.yaml create mode 100644 gpu-operator/manifests/input/mps-config-fine.yaml create mode 100644 gpu-operator/manifests/input/mps-verification.yaml create mode 100644 gpu-operator/manifests/output/mps-all-get-events.txt create mode 100644 gpu-operator/manifests/output/mps-get-pods.txt create mode 100644 gpu-operator/manifests/output/mps-logs-pods.txt diff --git a/gpu-operator/gpu-sharing-mps.rst b/gpu-operator/gpu-sharing-mps.rst new file mode 100644 index 000000000..79353644a --- /dev/null +++ b/gpu-operator/gpu-sharing-mps.rst @@ -0,0 +1,512 @@ +.. headings (h1/h2/h3/h4/h5) are # * = - + +.. _gpu-mps: + +################################### +Multi-Process Service in Kubernetes +################################### + +.. contents:: + :depth: 2 + :local: + :backlinks: none + + +*************************** +About Multi-Process Service +*************************** + +NVIDIA Multi-Process Service (MPS) provides the ability to share a GPU with multiple containers. + +The NVIDIA GPU Operator enables configuring MPS on a node by using +options for the `NVIDIA Kubernetes Device Plugin `_. +Using MPS, you can configure the number of *replicas* to create for each GPU on a node. +Each replica is allocatable by the kubelet to a container. + +You can apply a cluster-wide default MPS configuration and you can apply node-specific configurations. +For example, a cluster-wide configuration could create two replicas for each GPU on each node. +A node-specific configuration could be to create two replicas on some nodes and four replicas on other nodes. + +You can combine the two approaches by applying a cluster-wide default configuration +and then label nodes so that those nodes receive a node-specific configuration. + +Refer to :ref:`comparison-ts-mps-mig` for information about the available GPU sharing technologies. + + +Support Platforms and Resource Types +==================================== + +MPS is supported on bare-metal applications, virtual machines +with GPU passthrough, and virtual machines with NVIDIA vGPU. + +The only supported resource type is ``nvidia.com/gpu``. + + +Limitations +=========== + +- DCGM-Exporter does not support associating metrics to containers when MPS is enabled with the NVIDIA Kubernetes Device Plugin. +- The Operator does not monitor changes to the config map that configures the device plugin. +- MPS is not supported on GPU instances from Multi-Instance GPU (MIG) devices. +- MPS does not support requesting more than one GPU device. + Only one device resource request is supported: + + .. code-block:: yaml + + ... + spec: + containers: + resources: + limits: + nvidia.com/gpu: 1 + + +Changes to Node Labels +====================== + +In addition to the standard node labels that GPU Feature Discovery (GFD) +applies to nodes, the following label is also applied after you configure +MPS for a node: + +.. code-block:: yaml + + nvidia.com/.replicas = + +Where ```` is the factor by which each resource of ```` is equally divided. + +Additionally, by default, the ``nvidia.com/.product`` label is modified: + +.. code-block:: yaml + + nvidia.com/.product = -SHARED + +For example, on an NVIDIA DGX A100 machine, depending on the MPS configuration, +the labels can be similar to the following example: + +.. code-block:: yaml + + nvidia.com/gpu.replicas = 8 + nvidia.com/gpu.product = A100-SXM4-40GB-SHARED + +Using these labels, you can request access to a GPU replica or exclusive access to a GPU +in the same way that you traditionally specify a node selector to request one GPU model over another. +The ``-SHARED`` product name suffix ensures that you can specify a +node selector to assign pods to nodes with GPU replicas. + +The ``migStrategy`` configuration option has an effect on the node label for the product name. +When ``renameByDefault=false``, the default value, and ``migStrategy=single``, both the MIG profile name +and the ``-SHARED`` suffix are appended to the product name, such as the following example: + +.. code-block:: yaml + + nvidia.com/gpu.product = A100-SXM4-40GB-MIG-1g.5gb-SHARED + +If you set ``renameByDefault=true``, then the value of the ``nvidia.com/gpu.product`` node +label is not modified. + +************* +Configuration +************* + +About Configuring Multi-Process Service +======================================= + +You configure Multi-Process Service (MPS) by performing the following high-level steps: + +* Add a config map to the namespace that is used by the GPU Operator. +* Configure the cluster policy so that the device plugin uses the config map. +* Apply a label to the nodes that you want to configure for MPS. + +On a machine with one GPU, the following config map configures Kubernetes so that +the node advertises either two or four GPU resources. + +.. rubric:: Sample Config Map + +.. literalinclude:: ./manifests/input/mps-config-all.yaml + :language: yaml + +The following table describes the key fields in the config map. + +.. list-table:: + :header-rows: 1 + :widths: 15 10 75 + + * - Field + - Type + - Description + + * - ``data.`` + - string + - Specifies the time-slicing configuration name. + + You can specify multiple configurations if you want to assign node-specific configurations. + In the preceding example, the values for ``key`` are ``mps-two`` and ``mps-four``. + + * - ``flags.migStrategy`` + - string + - Specifies how to label MIG devices for the nodes that receive the MPS configuration. + Specify one of ``none``, ``single``, or ``mixed``. + + The default value is ``none``. + + * - ``renameByDefault`` + - boolean + - When set to ``true``, each resource is advertised under the name ``.shared`` + instead of ````. + + For example, if this field is set to ``true`` and the resource is typically ``nvidia.com/gpu``, + the nodes that are configured for MPS then advertise the resource as + ``nvidia.com/gpu.shared``. + Setting this field to true can be helpful if you want to schedule pods on GPUs with shared + access by specifying ``.shared`` in the resource request. + + When this field is set to ``false``, the advertised resource name, such as ``nvidia.com/gpu``, + is not modified. + However, the label for the product name is suffixed with ``-SHARED``. + For example, if the output of ``kubectl describe node`` shows the node label + ``nvidia.com/gpu.product=Tesla-T4``, then after the node is configured for MPS, + the label becomes ``nvidia.com/gpu.product=Tesla-T4-SHARED``. + In this case, you can specify a node selector that includes the ``-SHARED`` suffix to + schedule pods on GPUs with shared access. + + The default value is ``false``. + + * - ``failRequestsGreaterThanOne`` + - boolean + - This field is used with time-slicing GPUs and is ignored for MPS. + + For MPS, resource requests for GPUs must be set to ``1``. + Refer to the manifest examples or :ref:`Limitations`. + + * - ``resources.name`` + - string + - Specifies the resource type to make available with MPS, ``nvidia.com/gpu``. + + * - ``resources.replicas`` + - integer + - Specifies the number of MPS GPU replicas to make available for shared access to GPUs of the + specified resource type. + + +.. _mps-cluster-wide-config: + +Applying One Cluster-Wide Configuration +======================================= + +Perform the following steps to configure GPU sharing with MPS if you already installed the GPU operator +and want to apply the same MPS configuration on all nodes in the cluster. + +#. Create a file, such as ``mps-config-all.yaml``, with contents like the following example: + + .. literalinclude:: ./manifests/input/mps-config-all.yaml + :language: yaml + +#. Add the config map to the same namespace as the GPU operator: + + .. code-block:: console + + $ kubectl create -n gpu-operator -f mps-config-all.yaml + +#. Configure the device plugin with the config map and set the default GPU sharing configuration: + + .. code-block:: console + + $ kubectl patch clusterpolicies.nvidia.com/cluster-policy \ + -n gpu-operator --type merge \ + -p '{"spec": {"devicePlugin": {"config": {"name": "mps-config-all", "default": "mps-any"}}}}' + +#. Optional: Confirm that the ``gpu-feature-discovery`` and + ``nvidia-device-plugin-daemonset`` pods restart: + + .. code-block:: console + + $ kubectl get events -n gpu-operator --sort-by='.lastTimestamp' + + *Example Output* + + .. literalinclude:: ./manifests/output/mps-all-get-events.txt + :language: output + +#. Optional: After a few minutes, confirm that the Operator starts an MPS control daemon pod for each + node in the cluster that has a GPU. + + .. code-block:: console + + $ kubectl get pods -n gpu-operator -l app=nvidia-device-plugin-mps-control-daemon + + *Example Output* + + .. code-block:: output + + NAME READY STATUS RESTARTS AGE + nvidia-device-plugin-mps-control-daemon-9pq7z 2/2 Running 0 4m20s + nvidia-device-plugin-mps-control-daemon-kbwgp 2/2 Running 0 4m20s + +Refer to :ref:`mps-verify`. + +.. _mps-node-specific-config: + +Applying Multiple Node-Specific Configurations +============================================== + +An alternative to applying one cluster-wide configuration is to specify multiple +MPS configurations in the config map and to apply labels node-by-node to +control which configuration is applied to which nodes. + +#. Create a file, such as ``mps-config-fine.yaml``, with contents like the following example: + + .. literalinclude:: ./manifests/input/mps-config-fine.yaml + :language: yaml + +#. Add the config map to the same namespace as the GPU operator: + + .. code-block:: console + + $ kubectl create -n gpu-operator -f mps-config-fine.yaml + +#. Configure the device plugin with the config map: + + .. code-block:: console + + $ kubectl patch clusterpolicies.nvidia.com/cluster-policy \ + -n gpu-operator --type merge \ + -p '{"spec": {"devicePlugin": {"config": {"name": "mps-config-fine"}}}}' + + Because the specification does not include the ``devicePlugin.config.default`` field, + when the device plugin pods redeploy, they do not automatically apply the MPS + configuration to all nodes. + +#. Optional: Confirm that the ``gpu-feature-discovery`` and + ``nvidia-device-plugin-daemonset`` pods restart. + + .. code-block:: console + + $ kubectl get events -n gpu-operator --sort-by='.lastTimestamp' + + *Example Output* + + .. literalinclude:: ./manifests/output/mps-all-get-events.txt + :language: output + +#. Optional: After a few minutes, confirm that the Operator starts an MPS control daemon pod for each + node in the cluster that has a GPU. + + .. code-block:: console + + $ kubectl get pods -n gpu-operator -l app=nvidia-device-plugin-mps-control-daemon + + *Example Output* + + .. code-block:: output + + NAME READY STATUS RESTARTS AGE + nvidia-device-plugin-mps-control-daemon-9pq7z 2/2 Running 0 4m20s + nvidia-device-plugin-mps-control-daemon-kbwgp 2/2 Running 0 4m20s + +#. Apply a label to the nodes by running one or more of the following commands: + + * Apply a label to nodes one-by-one by specifying the node name: + + .. code-block:: console + + $ kubectl label node nvidia.com/device-plugin.config=mps-two + + * Apply a label to several nodes at one time by specifying a label selector: + + .. code-block:: console + + $ kubectl label node \ + --selector=nvidia.com/gpu.product=Tesla-T4 \ + nvidia.com/device-plugin.config=mps-two + +Refer to :ref:`mps-verify`. + + +Configuring Multi-Process Server Before Installing the NVIDIA GPU Operator +========================================================================== + +You can enable MPS with the NVIDIA GPU Operator by passing the +``devicePlugin.config.name=`` parameter during installation. + +Perform the following steps to configure MPS before installing the Operator: + +#. Create the namespace for the Operator: + + .. code-block:: console + + $ kubectl create namespace gpu-operator + +#. Create a file, such as ``mps-config.yaml``, with the config map contents. + + Refer to the :ref:`mps-cluster-wide-config` or + :ref:`mps-node-specific-config` sections. + +#. Add the config map to the same namespace as the Operator: + + .. code-block:: console + + $ kubectl create -f mps-config.yaml -n gpu-operator + +#. Install the operator with Helm: + + .. code-block:: console + + $ helm install gpu-operator nvidia/gpu-operator \ + -n gpu-operator \ + --set devicePlugin.config.name=mps-config + +#. Refer to either :ref:`mps-cluster-wide-config` or + :ref:`mps-node-specific-config` and perform the following tasks: + + * Configure the device plugin by running the ``kubectl patch`` command. + * Apply labels to nodes if you added a config map with node-specific configurations. + +After installation, refer to :ref:`mps-verify`. + + +.. _mps-update-config-map: + +Updating an MPS Config Map +========================== + +The Operator does not monitor the config map with the MPS configuration. +As a result, if you modify a config map, the device plugin pods do not restart and do not apply the modified configuration. + +#. To apply the modified config map, manually restart the device plugin pods: + + .. code-block:: console + + $ kubectl rollout restart -n gpu-operator daemonset/nvidia-device-plugin-daemonset + +#. Manually restart the MPS control daemon pods: + + .. code-block:: console + + $ kubectl rollout restart -n gpu-operator daemonset/nvidia-device-plugin-mps-control-daemon + +Currently running workloads are not affected and continue to run, though NVIDIA recommends performing the restart during a maintenance period. + + +.. _mps-verify: + +******************************* +Verifying the MPS Configuration +******************************* + +Perform the following steps to verify that the MPS configuration is applied successfully: + +#. Confirm that the node advertises additional GPU resources: + + .. code-block:: console + + $ kubectl describe node + + *Example Output* + + The example output varies according to the GPU in your node and the configuration + that you apply. + + The following output applies when ``renameByDefault`` is set to ``false``, the default value. + The key considerations are as follows: + + * The ``nvidia.com/gpu.count`` label reports the number of physical GPUs in the machine. + * The ``nvidia.com/gpu.product`` label includes a ``-SHARED`` suffix to the product name. + * The ``nvidia.com/gpu.replicas`` label matches the reported capacity. + * The ``nvidia.com/gpu.sharing-strategy`` label is set to ``mps``. + + .. code-block:: output + :emphasize-lines: 3-6,8 + + ... + Labels: + nvidia.com/gpu.count=4 + nvidia.com/gpu.product=Tesla-T4-SHARED + nvidia.com/gpu.replicas=4 + nvidia.com/gpu.sharing-strategy=mps + Capacity: + nvidia.com/gpu: 16 + ... + Allocatable: + nvidia.com/gpu: 16 + ... + + The following output applies when ``renameByDefault`` is set to ``true``. + The key considerations are as follows: + + * The ``nvidia.com/gpu.count`` label reports the number of physical GPUs in the machine. + * The ``nvidia.com/gpu`` capacity reports ``0``. + * The ``nvidia.com/gpu.shared`` capacity equals the number of physical GPUs multiplied by the + specified number of GPU replicas to create. + * The ``nvidia.com/gpu.sharing-strategy`` label is set to ``mps``. + + .. code-block:: output + :emphasize-lines: 3,8,9 + + ... + Labels: + nvidia.com/gpu.count=4 + nvidia.com/gpu.product=Tesla-T4 + nvidia.com/gpu.replicas=4 + nvidia.com/gpu.sharing-strategy=mps + Capacity: + nvidia.com/gpu: 0 + nvidia.com/gpu.shared: 16 + ... + Allocatable: + nvidia.com/gpu: 0 + nvidia.com/gpu.shared: 16 + ... + +#. Optional: Deploy a workload to validate GPU sharing: + + * Create a file, such as ``mps-verification.yaml``, with contents like the following: + + .. literalinclude:: ./manifests/input/mps-verification.yaml + :language: yaml + + * Create the deployment with multiple replicas: + + .. code-block:: console + + $ kubectl apply -f mps-verification.yaml + + * Verify that all five replicas are running: + + .. code-block:: console + + $ kubectl get pods + + *Example Output* + + .. literalinclude:: ./manifests/output/mps-get-pods.txt + :language: output + + * View the logs from one of the pods: + + .. code-block:: console + + $ kubectl logs deploy/time-slicing-verification + + *Example Output* + + .. literalinclude:: ./manifests/output/mps-logs-pods.txt + :language: output + + * Stop the deployment: + + .. code-block:: console + + $ kubectl delete -f mps-verification.yaml + + *Example Output* + + .. code-block:: output + + deployment.apps "mps-verification" deleted + + +*********** +References +*********** + +- `Multi-Process Service `__ documentation. diff --git a/gpu-operator/gpu-sharing.rst b/gpu-operator/gpu-sharing.rst index f5ad20105..4f717c7bf 100644 --- a/gpu-operator/gpu-sharing.rst +++ b/gpu-operator/gpu-sharing.rst @@ -54,12 +54,17 @@ and not modify nodes with other GPU models. You can combine the two approaches by applying a cluster-wide default configuration and then label nodes so that those nodes receive a node-specific configuration. -Comparison: Time-Slicing and Multi-Instance GPU -=============================================== +.. _comparison-ts-mps-mig: -The latest generations of NVIDIA GPUs provide an operation mode called -Multi-Instance GPU (MIG). MIG allows you to partition a GPU -into several smaller, predefined instances, each of which looks like a +Comparison: Time-Slicing, Multi-Process Service, and Multi-Instance GPU +======================================================================= + +Each of the technologies, time-slicing, Multi-Process Service (MPS), and Multi-Instance GPU (MIG) +enable sharing a physical GPU with more than one workload. + +NVIDIA A100 and newer GPUs provide an operation mode called MIG. +MIG enables you to partition a GPU into *slices*. +A slice is a smaller, predefined GPU instance that looks like a mini-GPU that provides memory and fault isolation at the hardware layer. You can share access to a GPU by running workloads on one of these predefined instances instead of the full native GPU. @@ -67,8 +72,17 @@ these predefined instances instead of the full native GPU. MIG support was added to Kubernetes in 2020. Refer to `Supporting MIG in Kubernetes `_ for details on how this works. -Time-slicing trades the memory and fault-isolation that is provided by MIG -for the ability to share a GPU by a larger number of users. +NVIDIA V100 and newer GPUs support MPS. +MPS enables dividing a physical GPU into *replicas* and assigning workloads to a replica. +While MIG provides fault isolation in hardware, MPS uses software to divide the GPU into replicas. +Each replica receives an equal portion of memory and thread percentage. +For example, if you configure two replicas, each replica has access to 50% of GPU memory and 50% of compute capacity. + +Time-slicing is available with all GPUs supported by the Operator. +Unlike MIG, time-slicing has no special memory or fault-isolation. +Like MPS, time-slicing uses the term *replica*, however, the GPU is not divided between workloads. +The GPU performs a context switch and swaps resources on and off the GPU when a workload is scheduled. + Time-slicing also provides a way to provide shared access to a GPU for older generation GPUs that do not support MIG. However, you can combine MIG and time-slicing to provide shared access to @@ -234,7 +248,7 @@ The following table describes the key fields in the config map. Applying One Cluster-Wide Configuration ======================================= -Perform the following steps to configure GPU time-slicing if you already installed the GPU operator +Perform the following steps to configure GPU time-slicing if you already installed the GPU Operator and want to apply the same time-slicing configuration on all nodes in the cluster. #. Create a file, such as ``time-slicing-config-all.yaml``, with contents like the following example: @@ -242,7 +256,7 @@ and want to apply the same time-slicing configuration on all nodes in the cluste .. literalinclude:: ./manifests/input/time-slicing-config-all.yaml :language: yaml -#. Add the config map to the same namespace as the GPU operator: +#. Add the config map to the same namespace as the GPU Operator: .. code-block:: console @@ -284,7 +298,7 @@ control which configuration is applied to which nodes. .. literalinclude:: ./manifests/input/time-slicing-config-fine.yaml :language: yaml -#. Add the config map to the same namespace as the GPU operator: +#. Add the config map to the same namespace as the GPU Operator: .. code-block:: console @@ -339,9 +353,9 @@ Configuring Time-Slicing Before Installing the NVIDIA GPU Operator You can enable time-slicing with the NVIDIA GPU Operator by passing the ``devicePlugin.config.name=`` parameter during installation. -Perform the following steps to configure time-slicing before installing the operator: +Perform the following steps to configure time-slicing before installing the Operator: -#. Create the namespace for the operator: +#. Create the namespace for the Operator: .. code-block:: console @@ -418,15 +432,17 @@ Perform the following steps to verify that the time-slicing configuration is app * The ``nvidia.com/gpu.count`` label reports the number of physical GPUs in the machine. * The ``nvidia.com/gpu.product`` label includes a ``-SHARED`` suffix to the product name. * The ``nvidia.com/gpu.replicas`` label matches the reported capacity. + * The ``nvidia.com/gpu.sharing-strategy`` label is set to ``time-slicing``. .. code-block:: output - :emphasize-lines: 3,4,5,7 + :emphasize-lines: 3-6,8 ... Labels: nvidia.com/gpu.count=4 nvidia.com/gpu.product=Tesla-T4-SHARED nvidia.com/gpu.replicas=4 + nvidia.com/gpu.sharing-strategy=time-slicing Capacity: nvidia.com/gpu: 16 ... @@ -441,15 +457,17 @@ Perform the following steps to verify that the time-slicing configuration is app * The ``nvidia.com/gpu`` capacity reports ``0``. * The ``nvidia.com/gpu.shared`` capacity equals the number of physical GPUs multiplied by the specified number of GPU replicas to create. + * The ``nvidia.com/gpu.sharing-strategy`` label is set to ``time-slicing``. .. code-block:: output - :emphasize-lines: 3,7,8 + :emphasize-lines: 3,8,9 ... Labels: nvidia.com/gpu.count=4 nvidia.com/gpu.product=Tesla-T4 nvidia.com/gpu.replicas=4 + nvidia.com/gpu.sharing-strategy=time-slicing Capacity: nvidia.com/gpu: 0 nvidia.com/gpu.shared: 16 diff --git a/gpu-operator/index.rst b/gpu-operator/index.rst index 18c02fe44..be8702601 100644 --- a/gpu-operator/index.rst +++ b/gpu-operator/index.rst @@ -40,6 +40,7 @@ :hidden: Multi-Instance GPU + MPS GPU Sharing Time-Slicing GPUs gpu-operator-rdma.rst Outdated Kernels diff --git a/gpu-operator/manifests/input/mps-config-all.yaml b/gpu-operator/manifests/input/mps-config-all.yaml new file mode 100644 index 000000000..25c5ae7f7 --- /dev/null +++ b/gpu-operator/manifests/input/mps-config-all.yaml @@ -0,0 +1,12 @@ +apiVersion: v1 +kind: ConfigMap +metadata: + name: mps-config-all +data: + mps-any: |- + version: v1 + sharing: + mps: + resources: + - name: nvidia.com/gpu + replicas: 4 diff --git a/gpu-operator/manifests/input/mps-config-fine.yaml b/gpu-operator/manifests/input/mps-config-fine.yaml new file mode 100644 index 000000000..f5b2ebc96 --- /dev/null +++ b/gpu-operator/manifests/input/mps-config-fine.yaml @@ -0,0 +1,22 @@ +apiVersion: v1 +kind: ConfigMap +metadata: + name: mps-config-fine +data: + mps-four: |- + version: v1 + sharing: + mps: + renameByDefault: false + resources: + - name: nvidia.com/gpu + replicas: 4 + mps-two: |- + version: v1 + sharing: + mps: + renameByDefault: false + resources: + - name: nvidia.com/gpu + replicas: 2 + diff --git a/gpu-operator/manifests/input/mps-verification.yaml b/gpu-operator/manifests/input/mps-verification.yaml new file mode 100644 index 000000000..fcac31425 --- /dev/null +++ b/gpu-operator/manifests/input/mps-verification.yaml @@ -0,0 +1,32 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: mps-verification + labels: + app: mps-verification +spec: + replicas: 5 + selector: + matchLabels: + app: mps-verification + template: + metadata: + labels: + app: mps-verification + spec: + tolerations: + - key: nvidia.com/gpu + operator: Exists + effect: NoSchedule + hostPID: true + containers: + - name: cuda-sample-vector-add + image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04" + command: ["/bin/bash", "-c", "--"] + args: + - while true; do /cuda-samples/vectorAdd; done + resources: + limits: + nvidia.com/gpu: 1 + nodeSelector: + nvidia.com/gpu.sharing-strategy: mps diff --git a/gpu-operator/manifests/input/time-slicing-verification.yaml b/gpu-operator/manifests/input/time-slicing-verification.yaml index 1f3d726f6..7daaf2a05 100644 --- a/gpu-operator/manifests/input/time-slicing-verification.yaml +++ b/gpu-operator/manifests/input/time-slicing-verification.yaml @@ -28,3 +28,5 @@ spec: resources: limits: nvidia.com/gpu: 1 + nodeSelector: + nvidia.com/gpu.sharing-strategy: time-slicing diff --git a/gpu-operator/manifests/output/mps-all-get-events.txt b/gpu-operator/manifests/output/mps-all-get-events.txt new file mode 100644 index 000000000..73fd4839c --- /dev/null +++ b/gpu-operator/manifests/output/mps-all-get-events.txt @@ -0,0 +1,11 @@ +LAST SEEN TYPE REASON OBJECT MESSAGE +38s Normal SuccessfulDelete daemonset/nvidia-device-plugin-daemonset Deleted pod: nvidia-device-plugin-daemonset-l86fw +38s Normal SuccessfulDelete daemonset/gpu-feature-discovery Deleted pod: gpu-feature-discovery-shj2m +38s Normal Killing pod/gpu-feature-discovery-shj2m Stopping container gpu-feature-discovery +38s Normal Killing pod/nvidia-device-plugin-daemonset-l86fw Stopping container nvidia-device-plugin +37s Normal Scheduled pod/nvidia-device-plugin-daemonset-lcklx Successfully assigned gpu-operator/nvidia-device-plugin-daemonset-lcklx to worker-1 +37s Normal SuccessfulCreate daemonset/gpu-feature-discovery Created pod: gpu-feature-discovery-pgx9l +37s Normal Scheduled pod/gpu-feature-discovery-pgx9l Successfully assigned gpu-operator/gpu-feature-discovery-pgx9l to worker-0 +37s Normal SuccessfulCreate daemonset/nvidia-device-plugin-daemonset Created pod: nvidia-device-plugin-daemonset-lcklx +36s Normal Created pod/nvidia-device-plugin-daemonset-lcklx Created container config-manager-init +36s Normal Pulled pod/nvidia-device-plugin-daemonset-lcklx Container image "nvcr.io/nvidia/cloud-native/gpu-operator-validator:v24.3.0" already present on machine \ No newline at end of file diff --git a/gpu-operator/manifests/output/mps-get-pods.txt b/gpu-operator/manifests/output/mps-get-pods.txt new file mode 100644 index 000000000..1425d382f --- /dev/null +++ b/gpu-operator/manifests/output/mps-get-pods.txt @@ -0,0 +1,6 @@ +NAME READY STATUS RESTARTS AGE +mps-verification-86c99b5666-hczcn 1/1 Running 0 3s +mps-verification-86c99b5666-sj8z5 1/1 Running 0 3s +mps-verification-86c99b5666-tnjwx 1/1 Running 0 3s +mps-verification-86c99b5666-82hxj 1/1 Running 0 3s +mps-verification-86c99b5666-9lhh6 1/1 Running 0 3s \ No newline at end of file diff --git a/gpu-operator/manifests/output/mps-logs-pods.txt b/gpu-operator/manifests/output/mps-logs-pods.txt new file mode 100644 index 000000000..3ff1149f5 --- /dev/null +++ b/gpu-operator/manifests/output/mps-logs-pods.txt @@ -0,0 +1,13 @@ +Found 5 pods, using pod/mps-verification-86c99b5666-tnjwx +[Vector addition of 50000 elements] +Copy input data from the host memory to the CUDA device +CUDA kernel launch with 196 blocks of 256 threads +Copy output data from the CUDA device to the host memory +Test PASSED +Done +[Vector addition of 50000 elements] +Copy input data from the host memory to the CUDA device +CUDA kernel launch with 196 blocks of 256 threads +Copy output data from the CUDA device to the host memory +Test PASSED +... \ No newline at end of file From eea3dc8af702d2eec5b9689eaa8df2100ac6076c Mon Sep 17 00:00:00 2001 From: Mike McKiernan Date: Wed, 12 Jun 2024 14:38:50 -0400 Subject: [PATCH 2/3] Add nvidia-cuda-mps-control cmds Signed-off-by: Mike McKiernan --- gpu-operator/gpu-sharing-mps.rst | 34 +++++++++++++++++++++++++++----- 1 file changed, 29 insertions(+), 5 deletions(-) diff --git a/gpu-operator/gpu-sharing-mps.rst b/gpu-operator/gpu-sharing-mps.rst index 79353644a..630cfe268 100644 --- a/gpu-operator/gpu-sharing-mps.rst +++ b/gpu-operator/gpu-sharing-mps.rst @@ -440,7 +440,7 @@ Perform the following steps to verify that the MPS configuration is applied succ * The ``nvidia.com/gpu.sharing-strategy`` label is set to ``mps``. .. code-block:: output - :emphasize-lines: 3,8,9 + :emphasize-lines: 4,9 ... Labels: @@ -485,24 +485,48 @@ Perform the following steps to verify that the MPS configuration is applied succ .. code-block:: console - $ kubectl logs deploy/time-slicing-verification + $ kubectl logs deploy/mps-verification *Example Output* .. literalinclude:: ./manifests/output/mps-logs-pods.txt :language: output + * View the default active thread percentage from one of the pods: + + .. code-block:: console + + $ kubectl exec deploy/mps-verification -- bash -c "echo get_default_active_thread_percentage | nvidia-cuda-mps-control" + + *Example Output* + + .. code-block:: output + + 25.0 + + * View the default pinned memory limit from one of the pods: + + .. code-block:: console + + $ kubectl exec deploy/mps-verification -- bash -c "echo get_default_device_pinned_mem_limit | nvidia-cuda-mps-control" + + *Example Output* + + .. code-block:: output + + 3G + * Stop the deployment: .. code-block:: console $ kubectl delete -f mps-verification.yaml - *Example Output* + *Example Output* - .. code-block:: output + .. code-block:: output - deployment.apps "mps-verification" deleted + deployment.apps "mps-verification" deleted *********** From 6d5d44523bde56f8f902d91fadf7fc72bcefe1c1 Mon Sep 17 00:00:00 2001 From: Mike McKiernan Date: Mon, 17 Jun 2024 08:47:04 -0400 Subject: [PATCH 3/3] Add limitation on replica count Signed-off-by: Mike McKiernan --- gpu-operator/gpu-sharing-mps.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/gpu-operator/gpu-sharing-mps.rst b/gpu-operator/gpu-sharing-mps.rst index 630cfe268..e09a7b736 100644 --- a/gpu-operator/gpu-sharing-mps.rst +++ b/gpu-operator/gpu-sharing-mps.rst @@ -47,6 +47,7 @@ Limitations - DCGM-Exporter does not support associating metrics to containers when MPS is enabled with the NVIDIA Kubernetes Device Plugin. - The Operator does not monitor changes to the config map that configures the device plugin. +- The maximum number of replicas that you can request is ``16`` for pre-Volta devices and ``48`` for newer devices. - MPS is not supported on GPU instances from Multi-Instance GPU (MIG) devices. - MPS does not support requesting more than one GPU device. Only one device resource request is supported: