From 080a61b43d48ca3e61c3d8d445fba76a3bf1443e Mon Sep 17 00:00:00 2001
From: Lyndon-Li <lyonghui@vmware.com>
Date: Wed, 3 Apr 2024 14:36:39 +0800
Subject: [PATCH] data mover node selection doc

Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
---
 changelogs/unreleased/7640-Lyndon-Li          |  1 +
 .../docs/main/csi-snapshot-data-movement.md   | 19 +++++-
 .../data-movement-backup-node-selection.md    | 60 +++++++++++++++++++
 3 files changed, 77 insertions(+), 3 deletions(-)
 create mode 100644 changelogs/unreleased/7640-Lyndon-Li
 create mode 100644 site/content/docs/main/data-movement-backup-node-selection.md

diff --git a/changelogs/unreleased/7640-Lyndon-Li b/changelogs/unreleased/7640-Lyndon-Li
new file mode 100644
index 0000000000..4483c4ff19
--- /dev/null
+++ b/changelogs/unreleased/7640-Lyndon-Li
@@ -0,0 +1 @@
+For issue #7036, add the document for data mover node selection
\ No newline at end of file
diff --git a/site/content/docs/main/csi-snapshot-data-movement.md b/site/content/docs/main/csi-snapshot-data-movement.md
index fb113fa186..11d0763d74 100644
--- a/site/content/docs/main/csi-snapshot-data-movement.md
+++ b/site/content/docs/main/csi-snapshot-data-movement.md
@@ -360,8 +360,8 @@ Velero calls the CSI plugin concurrently for the volume, so `DataUpload`/`DataDo
 In which manner the `DataUpload`/`DataDownload` CRs are processed is totally decided by the data mover you select for the backup/restore.  
 
 For Velero built-in data mover, it uses Kubernetes' scheduler to mount a snapshot volume/restore volume associated to a `DataUpload`/`DataDownload` CR into a specific node, and then the `DataUpload`/`DataDownload` controller (in node-agent daemonset) in that node will handle the `DataUpload`/`DataDownload`.  
-At present, a `DataUpload`/`DataDownload` controller in one node handles one request at a time.  
-That is to say, the snapshot volumes/restore volumes may spread in different nodes, then their associated `DataUpload`/`DataDownload` CRs will be processed in parallel; while for the snapshot volumes/restore volumes in the same node, their associated `DataUpload`/`DataDownload` CRs are processed sequentially.  
+By default, a `DataUpload`/`DataDownload` controller in one node handles one request at a time. You can configure more parallelism per node by [node-agent Concurrency Configuration][14].  
+That is to say, the snapshot volumes/restore volumes may spread in different nodes, then their associated `DataUpload`/`DataDownload` CRs will be processed in parallel; while for the snapshot volumes/restore volumes in the same node, by default, their associated `DataUpload`/`DataDownload` CRs are processed sequentially and can be processed concurrently according to your [node-agent Concurrency Configuration][14].    
 
 You can check in which node the `DataUpload`/`DataDownload` CRs are processed and their parallelism by watching the `DataUpload`/`DataDownload` CRs:
 
@@ -425,12 +425,23 @@ spec:
 ### Resource Consumption
 
 Both the uploader and repository consume remarkable CPU/memory during the backup/restore, especially for massive small files or large backup size cases.  
-Velero node-agent uses [BestEffort as the QoS][13] for node-agent pods (so no CPU/memory request/limit is set), so that backups/restores wouldn't fail due to resource throttling in any cases.  
+
+For Velero built-in data mover, Velero uses [BestEffort as the QoS][13] for node-agent pods (so no CPU/memory request/limit is set), so that backups/restores wouldn't fail due to resource throttling in any cases.  
 If you want to constraint the CPU/memory usage, you need to [customize the resource limits][11]. The CPU/memory consumption is always related to the scale of data to be backed up/restored, refer to [Performance Guidance][12] for more details, so it is highly recommended that you perform your own testing to find the best resource limits for your data.   
 
 During the restore, the repository may also cache data/metadata so as to reduce the network footprint and speed up the restore. The repository uses its own policy to store and clean up the cache.  
 For Kopia repository, the cache is stored in the node-agent pod's root file system and the cleanup is triggered for the data/metadata that are older than 10 minutes (not configurable at present). So you should prepare enough disk space, otherwise, the node-agent pod may be evicted due to running out of the ephemeral storage.  
 
+### Node Selection
+
+The node where a data movement backup/restore runs is decided by the data mover.  
+
+For Velero built-in data mover, it uses Kubernetes' scheduler to mount a snapshot volume/restore volume associated to a `DataUpload`/`DataDownload` CR into a specific node, and then the data movement backup/restore will happen in that node.  
+For the backup, you can intervene this scheduling process through [Data Movement Backup Node Selection][15], so that you can decide which node(s) should/should not run the data movement backup for various purposes.  
+For the restore, this is not supported because sometimes the data movement restore must run in the same node where the restored workload pod is scheduled.  
+
+
+
 
 [1]: https://github.com/vmware-tanzu/velero/pull/5968
 [2]: csi.md
@@ -445,3 +456,5 @@ For Kopia repository, the cache is stored in the node-agent pod's root file syst
 [11]: customize-installation.md#customize-resource-requests-and-limits
 [12]: performance-guidance.md
 [13]: https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/
+[14]: node-agent-concurrency.md
+[15]: data-movement-backup-node-selection.md
diff --git a/site/content/docs/main/data-movement-backup-node-selection.md b/site/content/docs/main/data-movement-backup-node-selection.md
new file mode 100644
index 0000000000..f5b20fa8a7
--- /dev/null
+++ b/site/content/docs/main/data-movement-backup-node-selection.md
@@ -0,0 +1,60 @@
+---
+title: "Node Selection for Data Movement Backup"
+layout: docs
+---
+
+Velero node-agent is a daemonset hosting the data movement modules to complete the concrete work of backups/restores.    
+Varying from the data size, data complexity, resource availability, the data movement may take a long time and remarkable resources (CPU, memory, network bandwidth, etc.) during the backup and restore. 
+
+Velero data movement backup supports to constrain the nodes where it runs. This is helpful in below scenarios:  
+- Prevent the data movement backup from running in specific nodes because users have more critical workloads in the nodes  
+- Constrain the data movement backup to run in specific nodes because these nodes have more resources than others  
+- Constrain the data movement backup to run in specific nodes because the storage allows volume/snapshot provisions in these nodes only  
+
+Velero introduces a new section in ```node-agent-config``` configMap, called ```loadAffinity```, through which you can specify the nodes to/not to run data movement backups, in the affinity and anti-affinity flavors.  
+If it is not there, ```node-agent-config``` should be created manually. The configMap should be in the same namespace where Velero is installed. If multiple Velero instances are installed in different namespaces, there should be one configMap in each namespace which applies to node-agent in that namespace only.  
+Node-agent server checks these configurations at startup time. Therefore, you could edit this configMap any time, but in order to make the changes effective, node-agent server needs to be restarted.  
+
+### Sample
+Here is a sample of the ```node-agent-config``` configMap with ```loadAffinity```:
+```json
+{
+    "loadAffinity": [
+        {
+            "nodeSelector": {
+                "matchLabels": {
+                    "beta.kubernetes.io/instance-type": "Standard_B4ms"
+                },
+                "matchExpressions": [
+                    {
+                        "key": "kubernetes.io/hostname",
+                        "values": [
+                            "node-1",
+                            "node-2",
+                            "node-3"
+                        ],
+                        "operator": "In"
+                    },
+                    {
+                        "key": "xxx/critial-workload",
+                        "operator": "DoesNotExist"
+                    }
+                ]          
+            }
+        }
+    ]
+}
+```  
+To create the configMap, save something like the above sample to a json file and then run below command:
+```
+kubectl create cm node-agent-config -n velero --from-file=<json file name>
+```
+
+### Affinity
+Affinity configuration means allowing the data movement backup to run in the nodes specified. There are two ways to define it:
+-  It could be defined by `MatchLabels`. The labels defined in `MatchLabels` means a `LabelSelectorOpIn` operation by default, so in the current context, they will be treated as affinity rules. In the above sample, it defines to run data movement backups in nodes with label `beta.kubernetes.io/instance-type` of value `Standard_B4ms` (Run data movement backups in `Standard_B4ms` nodes only).    
+- It could be defined by `MatchExpressions`. The labels are defined in `Key` and `Values` of `MatchExpressions` and the `Operator` should be defined as `LabelSelectorOpIn` or `LabelSelectorOpExists`. In the above sample, it defines to run data movement backups in nodes with label `kubernetes.io/hostname` of values `node-1`, `node-2` and `node-3` (Run data movement backups in `node-1`, `node-2` and `node-3` only).  
+
+### Anti-affinity
+Anti-affinity configuration means preventing the data movement backup from running in the nodes specified. Below is the way to define it:  
+- It could be defined by `MatchExpressions`. The labels are defined in `Key` and `Values` of `MatchExpressions` and the `Operator` should be defined as `LabelSelectorOpNotIn` or `LabelSelectorOpDoesNotExist`. In the above sample, it disallows data movement backups to run in nodes with label `xxx/critial-workload`.  
\ No newline at end of file