Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add offload plugin implmentation #1115

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mnecas
Copy link
Member

@mnecas mnecas commented Oct 18, 2024

Issue:

when migrating the VMs the VM disks need to be transfered over the network.
This is inefficient, takes additional storage and is slow. We need
an alternative way for the disk transfer.

Design:

This PR is PoC of adding an offload plugin for the disk transfer.
The offload plugin is a specific tool that needs to be implemented
by the CSI provider. To specify the storage offload plugin the user will
need to specify it in the StorageMap destination. This will allow the
users to migrate the VM even with disks across multiple data store types.
For example, some could be managed by the offloadPlugin and some would
still go over the network if needed.

Example of the storage map with offload plugin:

spec:
  map:
    - destination:
        offloadPlugin:
          image: 'quay.io/mnecas0/offload:latest'
          vars:
            test1: test1
            test2: test2
        storageClass: nfs-csi
      source:
        id: datastore-30
...

The offload plugin is started right after the CreateDataVolumes, this
the way the Kubernetes CSI will create empty PVs into which the disks can be
transfered.

Variables

The OffloadPlugin step creates a job on the destination cluster. The job
provided offload plugin image and user-defined variables which are
passed from the storagemap to the job. The job is started with the
following parameters:

  • HOST = url to the esxi host
  • PLAN_NAME = plane name
  • NAMESPACE = namepsace where the migration is running

In addition to these variables, it also mounts the secrets to the vCenter.
The secrets are in the path /etc/secret and the files are:

  • accessKeyId with a username
  • secretKey with a password

Note:

This change additionally requires #1109, because right now the
cold migration transfer is managed by the virt-v2v. The #1109 removes
this dependency and moves it out to the CNV CDI. This allows us to split
the transfer and conversion steps which were in the same step from the
forklift perspective. Once the disks are transferred the Forklift does the
virt-v2v-in-place on the disks and starts the VM.
In the same way, this step will be done also on the offload plugin as we
will transfer the disks using the offload plugin and then start
virt-v2v-in-place on the disks.

TODO:

  • Add design doc showing larger details, this is just PoC
  • Add check if OffloadPlugin image exists
  • Add check of the offload plugin disk transfer status
  • Allow storage map with OffloadPlugin and without combination
  • Improve the name of the offload pugin job, right now its the VM ID

Issues:
[1] Allow migration of "unknow" guests
Right now when we want to migrate an unknown and unsupported operating
system which is unsupported by the virt-v2v [3].

[2] Unifying the process and potential speedup
Right now we are using two different methods for the disk transfer. This
brings additional engineering for maintaining two paths. It's harder to
debug two different flows.
The virt-v2v transfers the disks in the sequence whereas using the CDI we
can start multiple disk imports in parallel. This can improve the
migration speeds.

Fix:
MTV is already using the CNV CDI for the warm and remote migration. We
just need to adjust the code to remove the virt-v2v transfer and rely on
the CNV CDI to do it for us.

Drawbacks:
- CNV CDI *requires* the VDDK, which was till now highly recommended.
- CNV CDI is not maintained inside the MTV and there might be problems
  escalating and backporting the patches as CNV has a different release
  cycle.
- Because we will be migrating all disks in parallel we need to
  optimise our migration scheduler as we don't want to take too much
  of the hosts/network resources. I have already done some
  optimisations in [4,5,6].

Notes:
This change removes the usage of virt-v2v and we will only use the
virt-v2v-in-place.

Ref:
[1] https://issues.redhat.com/browse/MTV-1536
[2] https://issues.redhat.com/browse/MTV-1581
[3] https://access.redhat.com/articles/1351473
[4] kubev2v#1088
[5] kubev2v#1087
[6] kubev2v#1086

Signed-off-by: Martin Necas <[email protected]>
@mnecas mnecas requested a review from yaacov as a code owner October 18, 2024 13:54
@mnecas mnecas changed the title Add offload mapping Add offload plugin implmentation Oct 18, 2024
@mnecas mnecas marked this pull request as draft October 18, 2024 14:07
Issue:
when migrating the VMs the VM disks need to be transfered over the network.
This is inefficient, takes additional storage and is slow. We need
an alternative way for the disk transfer.

Design:
This PR is PoC of adding an offload plugin for the disk transfer.
The offload plugin is a specific tool that needs to be implemented
by the CSI provider. To specify the storage offload plugin the user will
need to specify it in the StorageMap destination. This will allow the
users to migrate the VM even with disks across multiple data store types.
For example, some could be managed by the offloadPlugin and some would
still go over the network if needed.

Example of the storage map with offload plugin:
```
spec:
  map:
    - destination:
        offloadPlugin:
          image: 'quay.io/mnecas0/offload:latest'
          vars:
            test1: test1
            test2: test2
        storageClass: CSI
      source:
        id: datastore-30
...
```
The offload plugin is started right after the CreateDataVolumes, this
the way the Kubernetes CSI will create empty PVs into which the disks can be
transfered.

The OffloadPlugin step creates a job on the destination cluster. The job
provided offload plugin image and start. The job is started with the
following parameters:
- `HOST` = url to the esxi host
- `PLAN_NAME` = plane name
- `NAMESPACE` = namepsace where the migration is running

In addition to these variables, it also mounts the secrets to the vCenter.
The secrets are in the path `/etc/secret` and the files are:
- `accessKeyId` with a username
- `secretKey` with a password

Note:
This change additionally requires
kubev2v#1109, because right now the
cold migration transfer is managed by the virt-v2v. The kubev2v#1109 removes
this dependency and moves it out to the CNV CDI. This allows us to split
the transfer and conversion steps which were in the same step from the
forklift perspective. Once the disks are transferred the Forklift does the
`virt-v2v-in-place` on the disks and starts the VM.
In the same way, this step will be done also on the offload plugin as we
will transfer the disks using the offload plugin and then start
`virt-v2v-in-place` on the disks.

TODO:
[ ] Add design doc showing larger details, this is just PoC
[ ] Add check if OffloadPlugin image exists
[ ] Add check of the offload plugin disk transfer status
[ ] Allow storage map with OffloadPlugin and without combination
[ ] Improve the name of the offload pugin job, right now its the VM ID

Signed-off-by: Martin Necas <[email protected]>
@mnecas mnecas force-pushed the add_offload_mapping branch from cfa6b84 to c2b3402 Compare October 19, 2024 15:01
Copy link

Copy link
Contributor

@fabiand fabiand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include a design document, either in-tree or linked.

Copy link
Contributor

@fabiand fabiand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some open questions, let's put this on hold for a little.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants