From b46f15cda58ccb07455d4d0870a1b0e898ccba6f Mon Sep 17 00:00:00 2001
From: Mike McKiernan <mmckiernan@nvidia.com>
Date: Tue, 11 Jun 2024 08:00:19 -0400
Subject: [PATCH 1/3] MPS feature

Signed-off-by: Mike McKiernan <mmckiernan@nvidia.com>
---
 gpu-operator/gpu-sharing-mps.rst              | 512 ++++++++++++++++++
 gpu-operator/gpu-sharing.rst                  |  46 +-
 gpu-operator/index.rst                        |   1 +
 .../manifests/input/mps-config-all.yaml       |  12 +
 .../manifests/input/mps-config-fine.yaml      |  22 +
 .../manifests/input/mps-verification.yaml     |  32 ++
 .../input/time-slicing-verification.yaml      |   2 +
 .../manifests/output/mps-all-get-events.txt   |  11 +
 .../manifests/output/mps-get-pods.txt         |   6 +
 .../manifests/output/mps-logs-pods.txt        |  13 +
 10 files changed, 643 insertions(+), 14 deletions(-)
 create mode 100644 gpu-operator/gpu-sharing-mps.rst
 create mode 100644 gpu-operator/manifests/input/mps-config-all.yaml
 create mode 100644 gpu-operator/manifests/input/mps-config-fine.yaml
 create mode 100644 gpu-operator/manifests/input/mps-verification.yaml
 create mode 100644 gpu-operator/manifests/output/mps-all-get-events.txt
 create mode 100644 gpu-operator/manifests/output/mps-get-pods.txt
 create mode 100644 gpu-operator/manifests/output/mps-logs-pods.txt
diff --git a/gpu-operator/gpu-sharing-mps.rst b/gpu-operator/gpu-sharing-mps.rst
new file mode 100644
index 000000000..79353644a
--- /dev/null
+++ b/gpu-operator/gpu-sharing-mps.rst
@@ -0,0 +1,512 @@
+.. headings (h1/h2/h3/h4/h5) are # * = -
+
+.. _gpu-mps:
+
+###################################
+Multi-Process Service in Kubernetes
+###################################
+
+.. contents::
+   :depth: 2
+   :local:
+   :backlinks: none
+
+
+***************************
+About Multi-Process Service
+***************************
+
+NVIDIA Multi-Process Service (MPS) provides the ability to share a GPU with multiple containers.
+
+The NVIDIA GPU Operator enables configuring MPS on a node by using
+options for the `NVIDIA Kubernetes Device Plugin <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/k8s-device-plugin>`_.
+Using MPS, you can configure the number of *replicas* to create for each GPU on a node.
+Each replica is allocatable by the kubelet to a container. 
+
+You can apply a cluster-wide default MPS configuration and you can apply node-specific configurations.
+For example, a cluster-wide configuration could create two replicas for each GPU on each node.
+A node-specific configuration could be to create two replicas on some nodes and four replicas on other nodes.
+
+You can combine the two approaches by applying a cluster-wide default configuration
+and then label nodes so that those nodes receive a node-specific configuration.
+
+Refer to :ref:`comparison-ts-mps-mig` for information about the available GPU sharing technologies.
+
+
+Support Platforms and Resource Types
+====================================
+
+MPS is supported on bare-metal applications, virtual machines
+with GPU passthrough, and virtual machines with NVIDIA vGPU.
+
+The only supported resource type is ``nvidia.com/gpu``.
+
+
+Limitations
+===========
+
+- DCGM-Exporter does not support associating metrics to containers when MPS is enabled with the NVIDIA Kubernetes Device Plugin.
+- The Operator does not monitor changes to the config map that configures the device plugin.
+- MPS is not supported on GPU instances from Multi-Instance GPU (MIG) devices.
+- MPS does not support requesting more than one GPU device.
+  Only one device resource request is supported:
+
+  .. code-block:: yaml
+
+     ...
+       spec:
+         containers:
+           resources:
+             limits:
+               nvidia.com/gpu: 1
+
+
+Changes to Node Labels
+======================
+
+In addition to the standard node labels that GPU Feature Discovery (GFD)
+applies to nodes, the following label is also applied after you configure
+MPS for a node:
+
+.. code-block:: yaml
+
+   nvidia.com/<resource-name>.replicas = <replicas-count>
+
+Where ``<replicas-count>`` is the factor by which each resource of ``<resource-name>`` is equally divided.
+
+Additionally, by default, the ``nvidia.com/<resource-name>.product`` label is modified:
+
+.. code-block:: yaml
+
+   nvidia.com/<resource-name>.product = <product-name>-SHARED
+
+For example, on an NVIDIA DGX A100 machine, depending on the MPS configuration,
+the labels can be similar to the following example:
+
+.. code-block:: yaml
+
+   nvidia.com/gpu.replicas = 8
+   nvidia.com/gpu.product = A100-SXM4-40GB-SHARED
+
+Using these labels, you can request access to a GPU replica or exclusive access to a GPU
+in the same way that you traditionally specify a node selector to request one GPU model over another.
+The ``-SHARED`` product name suffix ensures that you can specify a
+node selector to assign pods to nodes with GPU replicas.
+
+The ``migStrategy`` configuration option has an effect on the node label for the product name.
+When ``renameByDefault=false``, the default value, and ``migStrategy=single``, both the MIG profile name
+and the ``-SHARED`` suffix are appended to the product name, such as the following example:
+
+.. code-block:: yaml
+
+    nvidia.com/gpu.product = A100-SXM4-40GB-MIG-1g.5gb-SHARED
+
+If you set ``renameByDefault=true``, then the value of the ``nvidia.com/gpu.product`` node
+label is not modified.
+
+*************
+Configuration
+*************
+
+About Configuring Multi-Process Service
+=======================================
+
+You configure Multi-Process Service (MPS) by performing the following high-level steps:
+
+* Add a config map to the namespace that is used by the GPU Operator.
+* Configure the cluster policy so that the device plugin uses the config map.
+* Apply a label to the nodes that you want to configure for MPS.
+
+On a machine with one GPU, the following config map configures Kubernetes so that
+the node advertises either two or four GPU resources.
+
+.. rubric:: Sample Config Map
+
+.. literalinclude:: ./manifests/input/mps-config-all.yaml
+   :language: yaml
+
+The following table describes the key fields in the config map.
+
+.. list-table::
+   :header-rows: 1
+   :widths: 15 10 75
+
+   * - Field
+     - Type
+     - Description
+
+   * - ``data.<key>``
+     - string
+     - Specifies the time-slicing configuration name.
+
+       You can specify multiple configurations if you want to assign node-specific configurations.
+       In the preceding example, the values for ``key`` are ``mps-two`` and ``mps-four``.
+
+   * - ``flags.migStrategy``
+     - string
+     - Specifies how to label MIG devices for the nodes that receive the MPS configuration.
+       Specify one of ``none``, ``single``, or ``mixed``.
+
+       The default value is ``none``.
+
+   * - ``renameByDefault``
+     - boolean
+     - When set to ``true``, each resource is advertised under the name ``<resource-name>.shared``
+       instead of ``<resource-name>``.
+
+       For example, if this field is set to ``true`` and the resource is typically ``nvidia.com/gpu``,
+       the nodes that are configured for MPS then advertise the resource as
+       ``nvidia.com/gpu.shared``.
+       Setting this field to true can be helpful if you want to schedule pods on GPUs with shared
+       access by specifying ``<resource-name>.shared`` in the resource request.
+
+       When this field is set to ``false``, the advertised resource name, such as ``nvidia.com/gpu``,
+       is not modified.
+       However, the label for the product name is suffixed with ``-SHARED``.
+       For example, if the output of ``kubectl describe node`` shows the node label
+       ``nvidia.com/gpu.product=Tesla-T4``, then after the node is configured for MPS,
+       the label becomes ``nvidia.com/gpu.product=Tesla-T4-SHARED``.
+       In this case, you can specify a node selector that includes the ``-SHARED`` suffix to
+       schedule pods on GPUs with shared access.
+
+       The default value is ``false``.
+
+   * - ``failRequestsGreaterThanOne``
+     - boolean
+     - This field is used with time-slicing GPUs and is ignored for MPS.
+
+       For MPS, resource requests for GPUs must be set to ``1``.
+       Refer to the manifest examples or :ref:`Limitations`.
+
+   * - ``resources.name``
+     - string
+     - Specifies the resource type to make available with MPS, ``nvidia.com/gpu``.
+
+   * - ``resources.replicas``
+     - integer
+     - Specifies the number of MPS GPU replicas to make available for shared access to GPUs of the
+       specified resource type.
+
+
+.. _mps-cluster-wide-config:
+
+Applying One Cluster-Wide Configuration
+=======================================
+
+Perform the following steps to configure GPU sharing with MPS if you already installed the GPU operator
+and want to apply the same MPS configuration on all nodes in the cluster.
+
+#. Create a file, such as ``mps-config-all.yaml``, with contents like the following example:
+
+   .. literalinclude:: ./manifests/input/mps-config-all.yaml
+      :language: yaml
+
+#. Add the config map to the same namespace as the GPU operator:
+
+   .. code-block:: console
+
+      $ kubectl create -n gpu-operator -f mps-config-all.yaml
+
+#. Configure the device plugin with the config map and set the default GPU sharing configuration:
+
+   .. code-block:: console
+
+      $ kubectl patch clusterpolicies.nvidia.com/cluster-policy \
+          -n gpu-operator --type merge \
+          -p '{"spec": {"devicePlugin": {"config": {"name": "mps-config-all", "default": "mps-any"}}}}'
+
+#. Optional: Confirm that the ``gpu-feature-discovery`` and
+   ``nvidia-device-plugin-daemonset`` pods restart:
+
+   .. code-block:: console
+
+      $ kubectl get events -n gpu-operator --sort-by='.lastTimestamp'
+
+   *Example Output*
+
+   .. literalinclude:: ./manifests/output/mps-all-get-events.txt
+      :language: output
+
+#. Optional: After a few minutes, confirm that the Operator starts an MPS control daemon pod for each
+   node in the cluster that has a GPU.
+
+   .. code-block:: console
+
+      $ kubectl get pods -n gpu-operator -l app=nvidia-device-plugin-mps-control-daemon
+
+   *Example Output*
+
+   .. code-block:: output
+
+      NAME                                            READY   STATUS    RESTARTS   AGE
+      nvidia-device-plugin-mps-control-daemon-9pq7z   2/2     Running   0          4m20s
+      nvidia-device-plugin-mps-control-daemon-kbwgp   2/2     Running   0          4m20s
+
+Refer to :ref:`mps-verify`.
+
+.. _mps-node-specific-config:
+
+Applying Multiple Node-Specific Configurations
+==============================================
+
+An alternative to applying one cluster-wide configuration is to specify multiple
+MPS configurations in the config map and to apply labels node-by-node to
+control which configuration is applied to which nodes.
+
+#. Create a file, such as ``mps-config-fine.yaml``, with contents like the following example:
+
+   .. literalinclude:: ./manifests/input/mps-config-fine.yaml
+      :language: yaml
+
+#. Add the config map to the same namespace as the GPU operator:
+
+   .. code-block:: console
+
+      $ kubectl create -n gpu-operator -f mps-config-fine.yaml
+
+#. Configure the device plugin with the config map:
+
+   .. code-block:: console
+
+      $ kubectl patch clusterpolicies.nvidia.com/cluster-policy \
+          -n gpu-operator --type merge \
+          -p '{"spec": {"devicePlugin": {"config": {"name": "mps-config-fine"}}}}'
+
+   Because the specification does not include the ``devicePlugin.config.default`` field,
+   when the device plugin pods redeploy, they do not automatically apply the MPS
+   configuration to all nodes.
+
+#. Optional: Confirm that the ``gpu-feature-discovery`` and
+   ``nvidia-device-plugin-daemonset`` pods restart.
+
+   .. code-block:: console
+
+      $ kubectl get events -n gpu-operator --sort-by='.lastTimestamp'
+
+   *Example Output*
+
+   .. literalinclude:: ./manifests/output/mps-all-get-events.txt
+      :language: output
+
+#. Optional: After a few minutes, confirm that the Operator starts an MPS control daemon pod for each
+   node in the cluster that has a GPU.
+
+   .. code-block:: console
+
+      $ kubectl get pods -n gpu-operator -l app=nvidia-device-plugin-mps-control-daemon
+
+   *Example Output*
+
+   .. code-block:: output
+
+      NAME                                            READY   STATUS    RESTARTS   AGE
+      nvidia-device-plugin-mps-control-daemon-9pq7z   2/2     Running   0          4m20s
+      nvidia-device-plugin-mps-control-daemon-kbwgp   2/2     Running   0          4m20s
+
+#. Apply a label to the nodes by running one or more of the following commands:
+
+   * Apply a label to nodes one-by-one by specifying the node name:
+
+     .. code-block:: console
+
+        $ kubectl label node <node-name> nvidia.com/device-plugin.config=mps-two
+
+   * Apply a label to several nodes at one time by specifying a label selector:
+
+     .. code-block:: console
+
+        $ kubectl label node \
+            --selector=nvidia.com/gpu.product=Tesla-T4 \
+            nvidia.com/device-plugin.config=mps-two
+
+Refer to :ref:`mps-verify`.
+
+
+Configuring Multi-Process Server Before Installing the NVIDIA GPU Operator
+==========================================================================
+
+You can enable MPS with the NVIDIA GPU Operator by passing the
+``devicePlugin.config.name=<config-map-name>`` parameter during installation.
+
+Perform the following steps to configure MPS before installing the Operator:
+
+#. Create the namespace for the Operator:
+
+   .. code-block:: console
+
+      $ kubectl create namespace gpu-operator
+
+#. Create a file, such as ``mps-config.yaml``, with the config map contents.
+
+   Refer to the :ref:`mps-cluster-wide-config` or
+   :ref:`mps-node-specific-config` sections.
+
+#. Add the config map to the same namespace as the Operator:
+
+   .. code-block:: console
+
+      $ kubectl create -f mps-config.yaml -n gpu-operator
+
+#. Install the operator with Helm:
+
+   .. code-block:: console
+
+      $ helm install gpu-operator nvidia/gpu-operator \
+          -n gpu-operator \
+          --set devicePlugin.config.name=mps-config
+
+#. Refer to either :ref:`mps-cluster-wide-config` or
+   :ref:`mps-node-specific-config` and perform the following tasks:
+
+   * Configure the device plugin by running the ``kubectl patch`` command.
+   * Apply labels to nodes if you added a config map with node-specific configurations.
+
+After installation, refer to :ref:`mps-verify`.
+
+
+.. _mps-update-config-map:
+
+Updating an MPS Config Map
+==========================
+
+The Operator does not monitor the config map with the MPS configuration.
+As a result, if you modify a config map, the device plugin pods do not restart and do not apply the modified configuration.
+
+#. To apply the modified config map, manually restart the device plugin pods:
+
+   .. code-block:: console
+
+      $ kubectl rollout restart -n gpu-operator daemonset/nvidia-device-plugin-daemonset
+
+#. Manually restart the MPS control daemon pods:
+
+   .. code-block:: console
+
+      $ kubectl rollout restart -n gpu-operator daemonset/nvidia-device-plugin-mps-control-daemon
+
+Currently running workloads are not affected and continue to run, though NVIDIA recommends performing the restart during a maintenance period.
+
+
+.. _mps-verify:
+
+*******************************
+Verifying the MPS Configuration
+*******************************
+
+Perform the following steps to verify that the MPS configuration is applied successfully:
+
+#. Confirm that the node advertises additional GPU resources:
+
+   .. code-block:: console
+
+      $ kubectl describe node <node-name>
+
+   *Example Output*
+
+   The example output varies according to the GPU in your node and the configuration
+   that you apply.
+
+   The following output applies when ``renameByDefault`` is set to ``false``, the default value.
+   The key considerations are as follows:
+
+   * The ``nvidia.com/gpu.count`` label reports the number of physical GPUs in the machine.
+   * The ``nvidia.com/gpu.product`` label includes a ``-SHARED`` suffix to the product name.
+   * The ``nvidia.com/gpu.replicas`` label matches the reported capacity.
+   * The ``nvidia.com/gpu.sharing-strategy`` label is set to ``mps``.
+
+   .. code-block:: output
+      :emphasize-lines: 3-6,8
+
+      ...
+      Labels:
+                        nvidia.com/gpu.count=4
+                        nvidia.com/gpu.product=Tesla-T4-SHARED
+                        nvidia.com/gpu.replicas=4
+                        nvidia.com/gpu.sharing-strategy=mps
+      Capacity:
+        nvidia.com/gpu: 16
+        ...
+      Allocatable:
+        nvidia.com/gpu: 16
+        ...
+
+   The following output applies when ``renameByDefault`` is set to ``true``.
+   The key considerations are as follows:
+
+   * The ``nvidia.com/gpu.count`` label reports the number of physical GPUs in the machine.
+   * The ``nvidia.com/gpu`` capacity reports ``0``.
+   * The ``nvidia.com/gpu.shared`` capacity equals the number of physical GPUs multiplied by the
+     specified number of GPU replicas to create.
+   * The ``nvidia.com/gpu.sharing-strategy`` label is set to ``mps``.
+
+   .. code-block:: output
+      :emphasize-lines: 3,8,9
+
+      ...
+      Labels:
+                        nvidia.com/gpu.count=4
+                        nvidia.com/gpu.product=Tesla-T4
+                        nvidia.com/gpu.replicas=4
+                        nvidia.com/gpu.sharing-strategy=mps
+      Capacity:
+        nvidia.com/gpu:        0
+        nvidia.com/gpu.shared: 16
+        ...
+      Allocatable:
+        nvidia.com/gpu:        0
+        nvidia.com/gpu.shared: 16
+        ...
+
+#. Optional: Deploy a workload to validate GPU sharing:
+
+   * Create a file, such as ``mps-verification.yaml``, with contents like the following:
+
+     .. literalinclude:: ./manifests/input/mps-verification.yaml
+        :language: yaml
+
+   * Create the deployment with multiple replicas:
+
+     .. code-block:: console
+
+        $ kubectl apply -f mps-verification.yaml
+
+   * Verify that all five replicas are running:
+
+     .. code-block:: console
+
+        $ kubectl get pods
+
+     *Example Output*
+
+     .. literalinclude:: ./manifests/output/mps-get-pods.txt
+        :language: output
+
+   * View the logs from one of the pods:
+
+     .. code-block:: console
+
+        $ kubectl logs deploy/time-slicing-verification
+
+     *Example Output*
+
+     .. literalinclude:: ./manifests/output/mps-logs-pods.txt
+        :language: output
+
+   * Stop the deployment:
+
+     .. code-block:: console
+
+        $ kubectl delete -f mps-verification.yaml
+
+    *Example Output*
+
+    .. code-block:: output
+
+       deployment.apps "mps-verification" deleted
+
+
+***********
+References
+***********
+
+- `Multi-Process Service <https://docs.nvidia.com/deploy/mps/index.html>`__ documentation.
diff --git a/gpu-operator/gpu-sharing.rst b/gpu-operator/gpu-sharing.rst
index f5ad20105..4f717c7bf 100644
--- a/gpu-operator/gpu-sharing.rst
+++ b/gpu-operator/gpu-sharing.rst
@@ -54,12 +54,17 @@ and not modify nodes with other GPU models.
 You can combine the two approaches by applying a cluster-wide default configuration
 and then label nodes so that those nodes receive a node-specific configuration.
 
-Comparison: Time-Slicing and Multi-Instance GPU
-===============================================
+.. _comparison-ts-mps-mig:
 
-The latest generations of NVIDIA GPUs provide an operation mode called
-Multi-Instance GPU (MIG). MIG allows you to partition a GPU
-into several smaller, predefined instances, each of which looks like a
+Comparison: Time-Slicing, Multi-Process Service, and Multi-Instance GPU
+=======================================================================
+
+Each of the technologies, time-slicing, Multi-Process Service (MPS), and Multi-Instance GPU (MIG)
+enable sharing a physical GPU with more than one workload.
+
+NVIDIA A100 and newer GPUs provide an operation mode called MIG.
+MIG enables you to partition a GPU into *slices*.
+A slice is a smaller, predefined GPU instance that looks like a
 mini-GPU that provides memory and fault isolation at the hardware layer.
 You can share access to a GPU by running workloads on one of
 these predefined instances instead of the full native GPU.
@@ -67,8 +72,17 @@ these predefined instances instead of the full native GPU.
 MIG support was added to Kubernetes in 2020. Refer to `Supporting MIG in Kubernetes <https://www.google.com/url?q=https://docs.google.com/document/d/1mdgMQ8g7WmaI_XVVRrCvHPFPOMCm5LQD5JefgAh6N8g/edit&sa=D&source=editors&ust=1655578433019961&usg=AOvVaw1F-OezvM-Svwr1lLsdQmu3>`_
 for details on how this works.
 
-Time-slicing trades the memory and fault-isolation that is provided by MIG
-for the ability to share a GPU by a larger number of users.
+NVIDIA V100 and newer GPUs support MPS.
+MPS enables dividing a physical GPU into *replicas* and assigning workloads to a replica.
+While MIG provides fault isolation in hardware, MPS uses software to divide the GPU into replicas.
+Each replica receives an equal portion of memory and thread percentage.
+For example, if you configure two replicas, each replica has access to 50% of GPU memory and 50% of compute capacity.
+
+Time-slicing is available with all GPUs supported by the Operator.
+Unlike MIG, time-slicing has no special memory or fault-isolation.
+Like MPS, time-slicing uses the term *replica*, however, the GPU is not divided between workloads.
+The GPU performs a context switch and swaps resources on and off the GPU when a workload is scheduled.
+
 Time-slicing also provides a way to provide shared access to a GPU for
 older generation GPUs that do not support MIG.
 However, you can combine MIG and time-slicing to provide shared access to
@@ -234,7 +248,7 @@ The following table describes the key fields in the config map.
 Applying One Cluster-Wide Configuration
 =======================================
 
-Perform the following steps to configure GPU time-slicing if you already installed the GPU operator
+Perform the following steps to configure GPU time-slicing if you already installed the GPU Operator
 and want to apply the same time-slicing configuration on all nodes in the cluster.
 
 #. Create a file, such as ``time-slicing-config-all.yaml``, with contents like the following example:
@@ -242,7 +256,7 @@ and want to apply the same time-slicing configuration on all nodes in the cluste
    .. literalinclude:: ./manifests/input/time-slicing-config-all.yaml
       :language: yaml
 
-#. Add the config map to the same namespace as the GPU operator:
+#. Add the config map to the same namespace as the GPU Operator:
 
    .. code-block:: console
 
@@ -284,7 +298,7 @@ control which configuration is applied to which nodes.
    .. literalinclude:: ./manifests/input/time-slicing-config-fine.yaml
       :language: yaml
 
-#. Add the config map to the same namespace as the GPU operator:
+#. Add the config map to the same namespace as the GPU Operator:
 
    .. code-block:: console
 
@@ -339,9 +353,9 @@ Configuring Time-Slicing Before Installing the NVIDIA GPU Operator
 You can enable time-slicing with the NVIDIA GPU Operator by passing the
 ``devicePlugin.config.name=<config-map-name>`` parameter during installation.
 
-Perform the following steps to configure time-slicing before installing the operator:
+Perform the following steps to configure time-slicing before installing the Operator:
 
-#. Create the namespace for the operator:
+#. Create the namespace for the Operator:
 
    .. code-block:: console
 
@@ -418,15 +432,17 @@ Perform the following steps to verify that the time-slicing configuration is app
    * The ``nvidia.com/gpu.count`` label reports the number of physical GPUs in the machine.
    * The ``nvidia.com/gpu.product`` label includes a ``-SHARED`` suffix to the product name.
    * The ``nvidia.com/gpu.replicas`` label matches the reported capacity.
+   * The ``nvidia.com/gpu.sharing-strategy`` label is set to ``time-slicing``.
 
    .. code-block:: output
-      :emphasize-lines: 3,4,5,7
+      :emphasize-lines: 3-6,8
 
       ...
       Labels:
                         nvidia.com/gpu.count=4
                         nvidia.com/gpu.product=Tesla-T4-SHARED
                         nvidia.com/gpu.replicas=4
+                        nvidia.com/gpu.sharing-strategy=time-slicing
       Capacity:
         nvidia.com/gpu: 16
         ...
@@ -441,15 +457,17 @@ Perform the following steps to verify that the time-slicing configuration is app
    * The ``nvidia.com/gpu`` capacity reports ``0``.
    * The ``nvidia.com/gpu.shared`` capacity equals the number of physical GPUs multiplied by the
      specified number of GPU replicas to create.
+   * The ``nvidia.com/gpu.sharing-strategy`` label is set to ``time-slicing``.
 
    .. code-block:: output
-      :emphasize-lines: 3,7,8
+      :emphasize-lines: 3,8,9
 
       ...
       Labels:
                         nvidia.com/gpu.count=4
                         nvidia.com/gpu.product=Tesla-T4
                         nvidia.com/gpu.replicas=4
+                        nvidia.com/gpu.sharing-strategy=time-slicing
       Capacity:
         nvidia.com/gpu:        0
         nvidia.com/gpu.shared: 16
diff --git a/gpu-operator/index.rst b/gpu-operator/index.rst
index 18c02fe44..be8702601 100644
--- a/gpu-operator/index.rst
+++ b/gpu-operator/index.rst
@@ -40,6 +40,7 @@
    :hidden:
 
    Multi-Instance GPU <gpu-operator-mig.rst>
+   MPS GPU Sharing <gpu-sharing-mps.rst>
    Time-Slicing GPUs <gpu-sharing.rst>
    gpu-operator-rdma.rst
    Outdated Kernels <install-gpu-operator-outdated-kernels.rst>
diff --git a/gpu-operator/manifests/input/mps-config-all.yaml b/gpu-operator/manifests/input/mps-config-all.yaml
new file mode 100644
index 000000000..25c5ae7f7
--- /dev/null
+++ b/gpu-operator/manifests/input/mps-config-all.yaml
@@ -0,0 +1,12 @@
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: mps-config-all
+data:
+  mps-any: |-
+    version: v1
+    sharing:
+      mps:
+        resources:
+        - name: nvidia.com/gpu
+          replicas: 4
diff --git a/gpu-operator/manifests/input/mps-config-fine.yaml b/gpu-operator/manifests/input/mps-config-fine.yaml
new file mode 100644
index 000000000..f5b2ebc96
--- /dev/null
+++ b/gpu-operator/manifests/input/mps-config-fine.yaml
@@ -0,0 +1,22 @@
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: mps-config-fine
+data:
+  mps-four: |-
+    version: v1
+    sharing:
+      mps:
+        renameByDefault: false
+        resources:
+        - name: nvidia.com/gpu
+          replicas: 4
+  mps-two: |-
+    version: v1
+    sharing:
+      mps:
+        renameByDefault: false
+        resources:
+        - name: nvidia.com/gpu
+          replicas: 2
+
diff --git a/gpu-operator/manifests/input/mps-verification.yaml b/gpu-operator/manifests/input/mps-verification.yaml
new file mode 100644
index 000000000..fcac31425
--- /dev/null
+++ b/gpu-operator/manifests/input/mps-verification.yaml
@@ -0,0 +1,32 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: mps-verification
+  labels:
+    app: mps-verification
+spec:
+  replicas: 5
+  selector:
+    matchLabels:
+      app: mps-verification
+  template:
+    metadata:
+      labels:
+        app: mps-verification
+    spec:
+      tolerations:
+        - key: nvidia.com/gpu
+          operator: Exists
+          effect: NoSchedule
+      hostPID: true
+      containers:
+        - name: cuda-sample-vector-add
+          image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04"
+          command: ["/bin/bash", "-c", "--"]
+          args:
+            - while true; do /cuda-samples/vectorAdd; done
+          resources:
+           limits:
+             nvidia.com/gpu: 1
+      nodeSelector:
+        nvidia.com/gpu.sharing-strategy: mps
diff --git a/gpu-operator/manifests/input/time-slicing-verification.yaml b/gpu-operator/manifests/input/time-slicing-verification.yaml
index 1f3d726f6..7daaf2a05 100644
--- a/gpu-operator/manifests/input/time-slicing-verification.yaml
+++ b/gpu-operator/manifests/input/time-slicing-verification.yaml
@@ -28,3 +28,5 @@ spec:
           resources:
            limits:
              nvidia.com/gpu: 1
+      nodeSelector:
+        nvidia.com/gpu.sharing-strategy: time-slicing
diff --git a/gpu-operator/manifests/output/mps-all-get-events.txt b/gpu-operator/manifests/output/mps-all-get-events.txt
new file mode 100644
index 000000000..73fd4839c
--- /dev/null
+++ b/gpu-operator/manifests/output/mps-all-get-events.txt
@@ -0,0 +1,11 @@
+LAST SEEN   TYPE     REASON             OBJECT                                              MESSAGE                                                                               
+38s         Normal   SuccessfulDelete   daemonset/nvidia-device-plugin-daemonset            Deleted pod: nvidia-device-plugin-daemonset-l86fw                                     
+38s         Normal   SuccessfulDelete   daemonset/gpu-feature-discovery                     Deleted pod: gpu-feature-discovery-shj2m
+38s         Normal   Killing            pod/gpu-feature-discovery-shj2m                     Stopping container gpu-feature-discovery                                              
+38s         Normal   Killing            pod/nvidia-device-plugin-daemonset-l86fw            Stopping container nvidia-device-plugin
+37s         Normal   Scheduled          pod/nvidia-device-plugin-daemonset-lcklx            Successfully assigned gpu-operator/nvidia-device-plugin-daemonset-lcklx to worker-1
+37s         Normal   SuccessfulCreate   daemonset/gpu-feature-discovery                     Created pod: gpu-feature-discovery-pgx9l
+37s         Normal   Scheduled          pod/gpu-feature-discovery-pgx9l                     Successfully assigned gpu-operator/gpu-feature-discovery-pgx9l to worker-0            
+37s         Normal   SuccessfulCreate   daemonset/nvidia-device-plugin-daemonset            Created pod: nvidia-device-plugin-daemonset-lcklx                                     
+36s         Normal   Created            pod/nvidia-device-plugin-daemonset-lcklx            Created container config-manager-init                                                 
+36s         Normal   Pulled             pod/nvidia-device-plugin-daemonset-lcklx            Container image "nvcr.io/nvidia/cloud-native/gpu-operator-validator:v24.3.0" already present on machine 
\ No newline at end of file
diff --git a/gpu-operator/manifests/output/mps-get-pods.txt b/gpu-operator/manifests/output/mps-get-pods.txt
new file mode 100644
index 000000000..1425d382f
--- /dev/null
+++ b/gpu-operator/manifests/output/mps-get-pods.txt
@@ -0,0 +1,6 @@
+NAME                                READY   STATUS    RESTARTS   AGE
+mps-verification-86c99b5666-hczcn   1/1     Running   0          3s
+mps-verification-86c99b5666-sj8z5   1/1     Running   0          3s
+mps-verification-86c99b5666-tnjwx   1/1     Running   0          3s
+mps-verification-86c99b5666-82hxj   1/1     Running   0          3s
+mps-verification-86c99b5666-9lhh6   1/1     Running   0          3s
\ No newline at end of file
diff --git a/gpu-operator/manifests/output/mps-logs-pods.txt b/gpu-operator/manifests/output/mps-logs-pods.txt
new file mode 100644
index 000000000..3ff1149f5
--- /dev/null
+++ b/gpu-operator/manifests/output/mps-logs-pods.txt
@@ -0,0 +1,13 @@
+Found 5 pods, using pod/mps-verification-86c99b5666-tnjwx
+[Vector addition of 50000 elements]
+Copy input data from the host memory to the CUDA device
+CUDA kernel launch with 196 blocks of 256 threads
+Copy output data from the CUDA device to the host memory
+Test PASSED
+Done
+[Vector addition of 50000 elements]
+Copy input data from the host memory to the CUDA device
+CUDA kernel launch with 196 blocks of 256 threads
+Copy output data from the CUDA device to the host memory
+Test PASSED
+...
\ No newline at end of file

From eea3dc8af702d2eec5b9689eaa8df2100ac6076c Mon Sep 17 00:00:00 2001
From: Mike McKiernan <mmckiernan@nvidia.com>
Date: Wed, 12 Jun 2024 14:38:50 -0400
Subject: [PATCH 2/3] Add nvidia-cuda-mps-control cmds

Signed-off-by: Mike McKiernan <mmckiernan@nvidia.com>
---
 gpu-operator/gpu-sharing-mps.rst | 34 +++++++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/gpu-operator/gpu-sharing-mps.rst b/gpu-operator/gpu-sharing-mps.rst
index 79353644a..630cfe268 100644
--- a/gpu-operator/gpu-sharing-mps.rst
+++ b/gpu-operator/gpu-sharing-mps.rst
@@ -440,7 +440,7 @@ Perform the following steps to verify that the MPS configuration is applied succ
    * The ``nvidia.com/gpu.sharing-strategy`` label is set to ``mps``.
 
    .. code-block:: output
-      :emphasize-lines: 3,8,9
+      :emphasize-lines: 4,9
 
       ...
       Labels:
@@ -485,24 +485,48 @@ Perform the following steps to verify that the MPS configuration is applied succ
 
      .. code-block:: console
 
-        $ kubectl logs deploy/time-slicing-verification
+        $ kubectl logs deploy/mps-verification
 
      *Example Output*
 
      .. literalinclude:: ./manifests/output/mps-logs-pods.txt
         :language: output
 
+   * View the default active thread percentage from one of the pods:
+
+     .. code-block:: console
+
+        $ kubectl exec deploy/mps-verification -- bash -c "echo get_default_active_thread_percentage | nvidia-cuda-mps-control"
+
+     *Example Output*
+
+     .. code-block:: output
+
+        25.0
+
+   * View the default pinned memory limit from one of the pods:
+
+     .. code-block:: console
+
+        $ kubectl exec deploy/mps-verification -- bash -c "echo get_default_device_pinned_mem_limit | nvidia-cuda-mps-control"
+
+     *Example Output*
+
+     .. code-block:: output
+
+        3G
+
    * Stop the deployment:
 
      .. code-block:: console
 
         $ kubectl delete -f mps-verification.yaml
 
-    *Example Output*
+     *Example Output*
 
-    .. code-block:: output
+     .. code-block:: output
 
-       deployment.apps "mps-verification" deleted
+        deployment.apps "mps-verification" deleted
 
 
 ***********

From 6d5d44523bde56f8f902d91fadf7fc72bcefe1c1 Mon Sep 17 00:00:00 2001
From: Mike McKiernan <mmckiernan@nvidia.com>
Date: Mon, 17 Jun 2024 08:47:04 -0400
Subject: [PATCH 3/3] Add limitation on replica count

Signed-off-by: Mike McKiernan <mmckiernan@nvidia.com>
---
 gpu-operator/gpu-sharing-mps.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gpu-operator/gpu-sharing-mps.rst b/gpu-operator/gpu-sharing-mps.rst
index 630cfe268..e09a7b736 100644
--- a/gpu-operator/gpu-sharing-mps.rst
+++ b/gpu-operator/gpu-sharing-mps.rst
@@ -47,6 +47,7 @@ Limitations
 
 - DCGM-Exporter does not support associating metrics to containers when MPS is enabled with the NVIDIA Kubernetes Device Plugin.
 - The Operator does not monitor changes to the config map that configures the device plugin.
+- The maximum number of replicas that you can request is ``16`` for pre-Volta devices and ``48`` for newer devices.
 - MPS is not supported on GPU instances from Multi-Instance GPU (MIG) devices.
 - MPS does not support requesting more than one GPU device.
   Only one device resource request is supported: