Add offload plugin implmentation #1115

mnecas · 2024-10-18T13:54:24Z

Issue:

when migrating the VMs the VM disks need to be transfered over the network.
This is inefficient, takes additional storage and is slow. We need
an alternative way for the disk transfer.

Design:

This PR is PoC of adding an offload plugin for the disk transfer.
The offload plugin is a specific tool that needs to be implemented
by the CSI provider. To specify the storage offload plugin the user will
need to specify it in the StorageMap destination. This will allow the
users to migrate the VM even with disks across multiple data store types.
For example, some could be managed by the offloadPlugin and some would
still go over the network if needed.

Example of the storage map with offload plugin:

spec:
  map:
    - destination:
        offloadPlugin:
          image: 'quay.io/mnecas0/offload:latest'
          vars:
            test1: test1
            test2: test2
        storageClass: nfs-csi
      source:
        id: datastore-30
...

The offload plugin is started right after the CreateDataVolumes, this
the way the Kubernetes CSI will create empty PVs into which the disks can be
transfered.

Variables

The OffloadPlugin step creates a job on the destination cluster. The job
provided offload plugin image and user-defined variables which are
passed from the storagemap to the job. The job is started with the
following parameters:

HOST = url to the esxi host
PLAN_NAME = plane name
NAMESPACE = namepsace where the migration is running

In addition to these variables, it also mounts the secrets to the vCenter.
The secrets are in the path /etc/secret and the files are:

accessKeyId with a username
secretKey with a password

Note:

This change additionally requires #1109, because right now the
cold migration transfer is managed by the virt-v2v. The #1109 removes
this dependency and moves it out to the CNV CDI. This allows us to split
the transfer and conversion steps which were in the same step from the
forklift perspective. Once the disks are transferred the Forklift does the
virt-v2v-in-place on the disks and starts the VM.
In the same way, this step will be done also on the offload plugin as we
will transfer the disks using the offload plugin and then start
virt-v2v-in-place on the disks.

TODO:

Add design doc showing larger details, this is just PoC
Add check if OffloadPlugin image exists
Add check of the offload plugin disk transfer status
Allow storage map with OffloadPlugin and without combination
Improve the name of the offload pugin job, right now its the VM ID

Issues: [1] Allow migration of "unknow" guests Right now when we want to migrate an unknown and unsupported operating system which is unsupported by the virt-v2v [3]. [2] Unifying the process and potential speedup Right now we are using two different methods for the disk transfer. This brings additional engineering for maintaining two paths. It's harder to debug two different flows. The virt-v2v transfers the disks in the sequence whereas using the CDI we can start multiple disk imports in parallel. This can improve the migration speeds. Fix: MTV is already using the CNV CDI for the warm and remote migration. We just need to adjust the code to remove the virt-v2v transfer and rely on the CNV CDI to do it for us. Drawbacks: - CNV CDI *requires* the VDDK, which was till now highly recommended. - CNV CDI is not maintained inside the MTV and there might be problems escalating and backporting the patches as CNV has a different release cycle. - Because we will be migrating all disks in parallel we need to optimise our migration scheduler as we don't want to take too much of the hosts/network resources. I have already done some optimisations in [4,5,6]. Notes: This change removes the usage of virt-v2v and we will only use the virt-v2v-in-place. Ref: [1] https://issues.redhat.com/browse/MTV-1536 [2] https://issues.redhat.com/browse/MTV-1581 [3] https://access.redhat.com/articles/1351473 [4] kubev2v#1088 [5] kubev2v#1087 [6] kubev2v#1086 Signed-off-by: Martin Necas <[email protected]>

Issue: when migrating the VMs the VM disks need to be transfered over the network. This is inefficient, takes additional storage and is slow. We need an alternative way for the disk transfer. Design: This PR is PoC of adding an offload plugin for the disk transfer. The offload plugin is a specific tool that needs to be implemented by the CSI provider. To specify the storage offload plugin the user will need to specify it in the StorageMap destination. This will allow the users to migrate the VM even with disks across multiple data store types. For example, some could be managed by the offloadPlugin and some would still go over the network if needed. Example of the storage map with offload plugin: ``` spec: map: - destination: offloadPlugin: image: 'quay.io/mnecas0/offload:latest' vars: test1: test1 test2: test2 storageClass: CSI source: id: datastore-30 ... ``` The offload plugin is started right after the CreateDataVolumes, this the way the Kubernetes CSI will create empty PVs into which the disks can be transfered. The OffloadPlugin step creates a job on the destination cluster. The job provided offload plugin image and start. The job is started with the following parameters: - `HOST` = url to the esxi host - `PLAN_NAME` = plane name - `NAMESPACE` = namepsace where the migration is running In addition to these variables, it also mounts the secrets to the vCenter. The secrets are in the path `/etc/secret` and the files are: - `accessKeyId` with a username - `secretKey` with a password Note: This change additionally requires kubev2v#1109, because right now the cold migration transfer is managed by the virt-v2v. The kubev2v#1109 removes this dependency and moves it out to the CNV CDI. This allows us to split the transfer and conversion steps which were in the same step from the forklift perspective. Once the disks are transferred the Forklift does the `virt-v2v-in-place` on the disks and starts the VM. In the same way, this step will be done also on the offload plugin as we will transfer the disks using the offload plugin and then start `virt-v2v-in-place` on the disks. TODO: [ ] Add design doc showing larger details, this is just PoC [ ] Add check if OffloadPlugin image exists [ ] Add check of the offload plugin disk transfer status [ ] Allow storage map with OffloadPlugin and without combination [ ] Improve the name of the offload pugin job, right now its the VM ID Signed-off-by: Martin Necas <[email protected]>

sonarqubecloud · 2024-10-19T15:02:27Z

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
1.1% Duplication on New Code

See analysis details on SonarCloud

fabiand

Please include a design document, either in-tree or linked.

fabiand

There are some open questions, let's put this on hold for a little.

mnecas requested a review from yaacov as a code owner October 18, 2024 13:54

mnecas changed the title ~~Add offload mapping~~ Add offload plugin implmentation Oct 18, 2024

mnecas marked this pull request as draft October 18, 2024 14:07

mnecas force-pushed the add_offload_mapping branch from cfa6b84 to c2b3402 Compare October 19, 2024 15:01

fabiand requested changes Oct 21, 2024

View reviewed changes

mnecas force-pushed the main branch from 862e385 to 59c31b0 Compare December 12, 2024 19:21

fabiand requested changes Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add offload plugin implmentation #1115

Add offload plugin implmentation #1115

mnecas commented Oct 18, 2024 •

edited

Loading

sonarqubecloud bot commented Oct 19, 2024

fabiand left a comment

fabiand left a comment

Add offload plugin implmentation #1115

Are you sure you want to change the base?

Add offload plugin implmentation #1115

Conversation

mnecas commented Oct 18, 2024 • edited Loading

Issue:

Design:

Variables

Note:

TODO:

sonarqubecloud bot commented Oct 19, 2024

Quality Gate passed

fabiand left a comment

Choose a reason for hiding this comment

fabiand left a comment

Choose a reason for hiding this comment

mnecas commented Oct 18, 2024 •

edited

Loading