harvester · TachunLin · Apr 10, 2024 · Apr 11, 2024 · Apr 12, 2024 · Apr 15, 2024
diff --git a/...content/manual/backup-and-restore/4604-restore-snapshot-restored-from-backup.md b/...content/manual/backup-and-restore/4604-restore-snapshot-restored-from-backup.md
@@ -0,0 +1,28 @@
+---
+title: Restore the snapshot when the vm already been restored from backup
+---
+
+* Related issues: [#4604](https://github.com/harvester/harvester/issues/4604) [BUG] Restore from snapshot not work if target VM is restore-replaced from backup
+
+
+## Category: 
+* Backup and Restore
+
+## Verification Steps
+1. Config backup-target to NFS backup target
+1. Create VM, wait VM to Running state
+1. Take Backup, wait backup Ready
+1. Take VM Snapshot, wait snapshot Ready
+1. Stop VM, wait VM Off
+1. Restore (Replace) from backup, wait VM Running
+1. Stop VM again
+1. Restore from snapshot, replace the current VM
+1. Check the vm is running
+1. Restore from snapshot, create a new VM
+1. Check the new vm is created and running
+
+## Expected Results
+* Can restore the snapshot to replace original vm when the vm have been restored from backup
+
+
+* Can restore the snapshot to create the new vm when the vm when the vm have been restored from backup
diff --git a/docs/content/manual/backup-and-restore/4907-vm-backup-restore-longhorn-1.5-plus.md b/docs/content/manual/backup-and-restore/4907-vm-backup-restore-longhorn-1.5-plus.md
@@ -0,0 +1,68 @@
+---
+title: VM backup restore use Longhorn v1.5+ attach/detach mechanism
+---
+
+* Related issues: [#4907](https://github.com/harvester/harvester/issues/4907) [ENHANCEMENT] Take advantage of the attach/detach mechanism in Longhorn v1.5+
+
+## Category: 
+* Backup and Restore
+
+## Verification Steps
+Case 1: snapshot can work on a stopped VM.
+1. Create a VM.
+1. After the VM is ready, stop the VM.
+1. Check VM volumes are detached.
+1. Take a snapshot on the VM. The snapshot can be ready.
     def test_create_vm_snapshot_while_pvc_detached(self, api_client, 
                                                    vm_snapshot_2_name, source_vm, wait_timeout): 
         """ 
         Test that a VM snapshot can be created when the source 
         PVC is detached. 
         Prerequisites: 
         The original VM (`source-vm`) exists and is stopped (so that 
         the PVC is detached.) 
         """ 
         name, _ = source_vm 
         stop_vm(name, api_client, wait_timeout) 
         code, _ = api_client.vm_snapshots.create(name, vm_snapshot_2_name) 
         assert 201 == code 
         deadline = datetime.now() + timedelta(seconds=wait_timeout) 
         while deadline > datetime.now(): 
             code, data = api_client.vm_snapshots.get(vm_snapshot_2_name) 
             if data.get("status", {}).get("readyToUse"): 
                 break 
             print(f"waiting for {vm_snapshot_2_name} to be ready") 
             sleep(3) 
         else: 
             raise AssertionError(f"timed out waiting for {vm_snapshot_2_name} to be ready") 
         code, data = api_client.vm_snapshots.get(vm_snapshot_2_name) 
         assert 200 == code 
         assert data.get("status", {}).get("readyToUse") is True 
     def test_create_vm_snapshot_while_pvc_detached(self, api_client, 
                                                    vm_snapshot_2_name, source_vm, wait_timeout): 
         """ 
         Test that a VM snapshot can be created when the source 
         PVC is detached. 
  
         Prerequisites: 
         The original VM (`source-vm`) exists and is stopped (so that 
         the PVC is detached.) 
         """ 
         name, _ = source_vm 
  
         stop_vm(name, api_client, wait_timeout) 
  
         code, _ = api_client.vm_snapshots.create(name, vm_snapshot_2_name) 
         assert 201 == code 
  
         deadline = datetime.now() + timedelta(seconds=wait_timeout) 
         while deadline > datetime.now(): 
             code, data = api_client.vm_snapshots.get(vm_snapshot_2_name) 
             if data.get("status", {}).get("readyToUse"): 
                 break 
             print(f"waiting for {vm_snapshot_2_name} to be ready") 
             sleep(3) 
         else: 
             raise AssertionError(f"timed out waiting for {vm_snapshot_2_name} to be ready") 
  
         code, data = api_client.vm_snapshots.get(vm_snapshot_2_name) 
  
         assert 200 == code 
         assert data.get("status", {}).get("readyToUse") is True 
+
+Case 2: restore a snapshot from detached volumes can work.
+1. Follow Case 1.
+1. Make sure VM volumes are detached.
+1. Restore the snapshot to a new VM. The new VM can be ready.
+1. Restore the snapshot to replace the old VM. The old VM can be ready.
+
+Case 3: backup can work on a stopped VM
     def test_restore_from_vm_snapshot_while_pvc_detached_from_source(self, 
                                                                      api_client, 
                                                                      restored_vm_2, 
                                                                      host_shell, 
                                                                      vm_shell, 
                                                                      ssh_keypair, 
                                                                      wait_timeout): 
         """ 
         Test that a new virtual machine can be created from a 
         VM snapshot created from a source PersistentVolumeClaim 
         that is now detached. 
         Prerequisites: 
         The original VM (`source-vm`) exists and is stopped (so that 
         the PVC is detached.) 
         The original snapshot (`vm-snapshot`) exists. 
         """ 
         name, ssh_user = restored_vm_2 
         def actassert(sh): 
             out, _ = sh.exec_command("cat test.txt") 
             assert "123" in out 
         vm_shell_do(name, api_client, 
                     host_shell, vm_shell, 
                     ssh_user, ssh_keypair, 
                     actassert, wait_timeout) 
     def test_restore_from_vm_snapshot_while_pvc_detached_from_source(self, 
                                                                      api_client, 
                                                                      restored_vm_2, 
                                                                      host_shell, 
                                                                      vm_shell, 
                                                                      ssh_keypair, 
                                                                      wait_timeout): 
         """ 
         Test that a new virtual machine can be created from a 
         VM snapshot created from a source PersistentVolumeClaim 
         that is now detached. 
  
         Prerequisites: 
         The original VM (`source-vm`) exists and is stopped (so that 
         the PVC is detached.) 
  
         The original snapshot (`vm-snapshot`) exists. 
         """ 
  
         name, ssh_user = restored_vm_2 
  
         def actassert(sh): 
             out, _ = sh.exec_command("cat test.txt") 
             assert "123" in out 
  
         vm_shell_do(name, api_client, 
                     host_shell, vm_shell, 
                     ssh_user, ssh_keypair, 
                     actassert, wait_timeout) 
+1. Create a VM.
+1. After the VM is ready, stop the VM.
+1. Check VM volumes are detached.
+1. Take a backup on the VM. The backup can be ready.
+
+Case 4: race condition doesn't break VMBackup
+1. Create a VM.
+1. After the VM is ready, stop the VM.
+1. Check VM volumes are detached.
+1. Take multiple backup on the VM in a short time. All backup can be ready.
+
+Case 5: restore a backup from detached volumes can work:
+1. Follow Case 3.
+1. Make sure VM volumes are detached.
+1. Restore the backup to a new VM. The new VM can be ready.
+1. Restore the backup to replace the old VM. The old VM can be ready.
+
+## Expected Results
+* Case 1: snapshot can work on a stopped VM 
+   - VM volumes are detached after we stop VM.
+      ```
+      node1:~ # kubectl get volume -A
+      NAMESPACE         NAME                                       STATE      ROBUSTNESS   SCHEDULED   SIZE          NODE   AGE
+      longhorn-system   pvc-5a861225-920d-4059-b501-f02b2fd0ff27   detached   unknown                  10737418240          19m
+      ```
+   - Take the vm snapshot, The snapshot of vm in off state can be ready.
+* Case 2: restore a snapshot from detached volumes can work. 
+   - Restore the snapshot to a new VM. The new VM can be ready.
+   - Restore the snapshot to replace the old VM. The old VM can be ready.
+
+* Case 3: backup can work on a stopped VM. 
+   - VM volumes are detached after we stop VM.
+      ```
+      NAMESPACE         NAME                                       STATE      ROBUSTNESS   SCHEDULED   SIZE          NODE   AGE
+      longhorn-system   pvc-d1226d97-ab90-4d40-92f9-960b668093c2   detached   unknown                  10737418240          5m12s
+      ```
+   -  Take the vm backup, The backup of vm in off state can be ready.
+
+* Case 4: race condition doesn't break VMBackup 
+   - Take multiple backup on the VM in a short time. All backup can be ready.
+
+* Case 5: restore a backup from detached volumes can work 
+   - Restore the backup to a new VM. The new VM can be ready.
+   - Restore the backup to replace the old VM. (Retain Volume), The old VM can be ready.
+   - Restore the backup to replace the old VM. (Delete Volume), The old VM can be ready.
diff --git a/docs/content/manual/backup-and-restore/4954-cannot-resotre-new-vm-have-snapshot.md b/docs/content/manual/backup-and-restore/4954-cannot-resotre-new-vm-have-snapshot.md
@@ -0,0 +1,27 @@
+---
+title: VM have snapshot can't be restored to new VM
+---
+
+* Related issues: [#4954](https://github.com/harvester/harvester/issues/4954) [BUG] VM Restore to new VM doesn't work if there is VM Snapshot
+
+
+## Category: 
+* Backup and Restore
+
+## Verification Steps
+1. Prepare a 3 nodes Harvester cluster
+1. Create a vm named vm1
+1. Setup nfs backup target
+1. Create a backup for the vm1
+1. Create a snapshot for the vm1
+1. Restore the backup of vm1 to create a new VM
+1. Check can restore vm correctly
+1. Shutdown vm1
+1. Restore the backup of vm to replace the existing vm
+1. Select Retain volume
+1. Check can restore vm correctly
+
+## Expected Results
+*  Can restore vm backup to create a new vm when already setup vm backup and snapshot on it.
+
+*  Can restore vm backup to replace existing vm (retain volume) when already setup vm backup and snapshot on it.
diff --git a/docs/content/manual/deployment/4480-config-sftp-dynamically.md b/docs/content/manual/deployment/4480-config-sftp-dynamically.md
@@ -0,0 +1,84 @@
+---
+title: Config sftp dynamically
+---
+
+* Related issues: [#4480](https://github.com/harvester/harvester/issues/4480) [ENHANCEMENT] config sftp dynamically
+
+
+## Category: 
+* Deployment
+
+## Verification Steps
+1. Use ipxe-example to provision Harvester
+1. Add the following sshd config to the config-create.yaml and config-join.yaml of Harvester configuration file
+    ```
+    os:
+    sshd:
+    sftp: true
+    ```
+1. Provision the 3 nodes Harvester cluster
+1. After provisioning or upgrade complete, ssh to the management and worker node
+1. Check the file exists
+    ```
+    sudo tail -n5 /etc/ssh/sshd_config
+    cat /etc/ssh/sshd_config.d/sftp.conf
+    ```
+1. Use sftp to connect to Harvester, check the connection works
+    ```
+    sftp [email protected]
+    ```
+
+## Expected Results
+1. Fresh install: on create node and compute node machine, we can sftp connect to the node and find the corresponding file
+    ```
+    harvester@localhost:~/Harvester/ipxe-examples/vagrant-pxe-harvester> ssh [email protected]
+    Password: 
+    Last login: Wed Jan 10 05:42:44 2024 from 192.168.0.1
+    rancher@harvester-node-0:~> pwd
+    /home/rancher
+    rancher@harvester-node-0:~> vim test.txt
+    rancher@harvester-node-0:~> exit
+    logout
+    Connection to 192.168.0.131 closed.
+    harvester@localhost:~/Harvester/ipxe-examples/vagrant-pxe-harvester> sftp [email protected]
+    Password: 
+    Connected to 192.168.0.131.
+    sftp> ls -al
+    drwxr-xr-x    3 rancher  rancher      4096 Jan 10 05:48 .
+    drwxr-xr-x    3 root     root         4096 Jan 10 03:56 ..
+    -rw-------    1 rancher  rancher        92 Jan 10 05:48 .bash_history
+    drwx------    2 rancher  rancher      4096 Jan 10 03:56 .ssh
+    -rw-r-----    1 rancher  rancher         5 Jan 10 05:48 test.txt
+    sftp> get test.txt test.txt 
+    Fetching /home/rancher/test.txt to test.txt
+    /home/rancher/test.txt                                                                                                      100%    5     5.3KB/s   00:00
+    ```
+1. After reboot machine, the sftp service and corresponding file exists 
+    ```
+    harvester@localhost:~/Harvester/ipxe-examples/vagrant-pxe-harvester> ssh [email protected]
+    Password: 
+    rancher@harvester-node-2:~> sudo tail -n5 /etc/ssh/sshd_config
+    AllowAgentForwarding no
+    X11Forwarding no
+    AllowTcpForwarding no
+    MaxAuthTries 3
+    Include /etc/ssh/sshd_config.d/*.conf
+    rancher@harvester-node-2:~> exit
+    logout
+    Connection to 192.168.0.32 closed.
+    harvester@localhost:~/Harvester/ipxe-examples/vagrant-pxe-harvester> sftp [email protected]
+    Password: 
+    Connected to 192.168.0.32.
+    ```
+1. After upgrade, the sftp service and corresponding file exists
+    ```
+    rancher@harvester-node-0:~> cat /etc/ssh/sshd_config.d/sftp.conf
+    Subsystem	sftp	/usr/lib/ssh/sftp-server
+    rancher@harvester-node-0:~> exit
+    logout
+    Connection to 192.168.0.131 closed.
+    davidtclin@localhost:~/Documents/Project_Repo/ipxe-examples/vagrant-pxe-harvester> sftp [email protected]
+    Password: 
+    Connected to 192.168.0.131.
+    sftp>
+    ```
diff --git a/docs/content/manual/harvester-rancher/4678-load-balancer-different-roles-pool.md b/docs/content/manual/harvester-rancher/4678-load-balancer-different-roles-pool.md
@@ -0,0 +1,77 @@
+---
+title: Create load balancer on different roles pools in RKE2 cluster
+---
+
+* Related issues: [#4678](https://github.com/harvester/harvester/issues/4678) [BUG] Unable to successfully create LoadBalancer service from Guest cluster post 1.2.1 upgrade
+
+
+
+## Category: 
+* Rancher 
+
+## Verification Steps
+1. Prepare three nodes Harvester and import with Rancher
+1. Provision a RKE2 guest cluster in Rancher
+1. Create the first pool, specify the `etcd` and `control plane` roles
+1. Create the second pool, specify the `work` role only
+1. Add the following cloud config in the `User Data` of each pool
+    ```
+    write_files:
+    - encoding: b64
+    content: {harvester's kube config, the cluster namespace should be same as the pool you created (base64 encoded)}
+    owner: root:root
+    path: /etc/kubernetes/cloud-config
+    permission: '0644'
+    ```
+1. Get the Harvester kubeconfig file, remember to add the namespace (should be the same with the guest cluster)
+    ```
+    contexts:
+    - name: "local"
+    context:
+    user: "local"
+    cluster: "local"
+    namespace: "default" ---------------------> Add this line
+    ```
+1. Output the Harvester kubeconfig into base64 format without new line
+    ```
+    cat local.yaml | base64 -w 0
+    ```
+1. Copy the base64 encoded kubeconfig to the cloud config write file sections above
+{{< image "images/rancher/4678-base64-kubeconfig.png" >}}
+1. Provision the RKE2 guest cluster
+1. After pools are created, we remove the harvester-cloud-provider in Apps > Installed Apps (kube-system namespace).
+1. Add new charts in Apps > Repositories, use https://charts.harvesterhci.io to install and select 0.2.3.
+{{< image "images/rancher/4678-add-harvester-repo.png" >}}
+1. Install Harvester cloud provider 0.2.3 from market
+{{< image "images/rancher/4678-install-cloud-provider.png" >}}
+1. Create a nginx deployment using Harvester storage class
+1. Create a load balancer service with dhcp type and bind to nginx deployment
+1. Check can install the load balancer
+1. Create a standalone load balancer service, set the following annotation
+1. Access to the guest cluster VM, check the load balancer service can get `EXTERNAL_IP`
+
+1. Prepare three nodes Harvester and import with Rancher
+1. Provision a RKE2 guest cluster in Rancher
+1. Create the first pool, specif the `control plane` role only
+1. Create the second pool, specify the `etcd` role only
+1. Create the third pool, specify the `worker` role only
+1. Repeat the steps 13 - 17 to create load balancer service
+
+1. Prepare three nodes Harvester and import with Rancher
+1. Provision a RKE2 guest cluster in Rancher
+1. Create the only one pool, specif the control plane, etcd and workers roles
                         pytest.param(3, marks=pytest.mark.skip(reason="Skip for low I/O env."))]) 
                         pytest.param(3, marks=pytest.mark.skip(reason="Skip for low I/O env."))]) 
+
+
+## Expected Results
+*  Control-plane, ETCD and worker in same pool: 
+    - Can successfully create LoadBalancer service on RKE2 guest cluster
+    - All load balance type service have assigned the EXTERNAL_IP
+
+
+*  Control-plane and ETCD in A pool, worker in B pool:
+    - Can successfully create LoadBalancer service on RKE2 guest cluster
+    - All load balance type service have assigned the EXTERNAL_IP
+
+*  Control-plane in A pool, ETCD in B pool and worker in C pool:
+    - Can successfully create LoadBalancer service on RKE2 guest cluster
+    - All load balance type service have assigned the EXTERNAL_IP
diff --git a/docs/content/manual/upgrade/4461-check-space-before-upgrade.md b/docs/content/manual/upgrade/4461-check-space-before-upgrade.md
@@ -0,0 +1,63 @@
+---
+title: Check Free disk space percent before upgrade
+---
+
+* Related issues: [#4611](https://github.com/harvester/harvester/issues/4611) [ENHANCEMENT] Check free disk space percent before upgrade
+
+
+## Category: 
+* Upgrade Harvester
+
+## Verification Steps
+1. Create a new Harvester cluster. Each node's disk space should be 250G.
+1. Use dd to write files to make sure one of the node's free disk space is 20G, ff every node's /usr/local free space is more than 30G
+    ```
+    dd if=/dev/zero of=/usr/local/test.img bs=1G count=93 .
+    ```
+1. Ensure /usr/local did not have enough spaces
+    ```
+    rancher@n1-240104:~> df -h /usr/local
+    Filesystem      Size  Used Avail Use% Mounted on
+    /dev/vda5       147G   28G  112G  21% /usr/local
+    rancher@n1-240104:~> sudo dd if=/dev/zero of=/usr/local/test.img bs=1G count=93
+    93+0 records in
+    93+0 records out
+    99857989632 bytes (100 GB, 93 GiB) copied, 327.656 s, 305 MB/s
+    rancher@n1-240104:~> df -h /usr/local
+    Filesystem      Size  Used Avail Use% Mounted on
+    /dev/vda5       147G  116G   24G  84% /usr/local
+    ```
+1. Create a test version  
+    ```
+    apiVersion: harvesterhci.io/v1beta1
+    kind: Version
+    metadata:
+    annotations:
+        harvesterhci.io/minFreeDiskSpaceGB: "5"
+    name: 1.2.2
+    namespace: harvester-system
+    spec:
+    isoURL: http://192.168.0.181:8000/harvester-v1.2.2-amd64.iso
+    minUpgradableVersion: 1.2.1
+    releaseDate: "20231210"
+    tags:
+    - dev
+    - test
+    ```
+1. Click upgrade on the dashboard
+
+1. Remove the dd file
+1. Ensure we free the /usr/local spaces
+    ```
+    rancher@n1-240104:/usr/local> sudo rm test.img
+    rancher@n1-240104:/usr/local> df -h /usr/local
+    Filesystem      Size  Used Avail Use% Mounted on
+    /dev/vda5       147G   23G  118G  16% /usr/local
+    ```
+1. Click the upgrade again
+
+## Expected Results
+* Should get an error message on the upgrade dashboard
+{{< image "images/upgrade/4461-upgrade-space-check.png" >}}
+
+*  There is no error message on the upgrade dashboard, can trigger the upgrade process
diff --git a/docs/content/manual/upgrade/4725-upgrade-shutdown-vm-in-os.md b/docs/content/manual/upgrade/4725-upgrade-shutdown-vm-in-os.md
@@ -0,0 +1,22 @@
+---
+title: Upgrade with VM shutdown in the operating system 
+---
+
+* Related issues: [#4725](https://github.com/harvester/harvester/issues/4725) [BUG] Upgrade 1.2.0 -> 1.2.1 is stuck in “Waiting for VM live-migration or shutdown...(1 left)” even though there is NO VM running
+
+
+## Category: 
+* Virtual Machines
+
+## Verification Steps
+1. Prepare 3 nodes v1.2.1 Harvester cluster
+1. Create a Windows server 2022 VM
+1. Create a openSUSE leap 15.4 VM
+1. Open web console of each VM
+1. Login to Windows VM and shutdown machine from menu
+1. Login to openSUSE VM , use command to shutdown machine
+1. Upgrade Harvester to v1.3.0-rc1
+1. Check the upgrade process
+
+## Expected Results
+* We can successfully upgrade to from v1.2.1 to v1.3.0-rc1 and remain the OS state, did not encounter stuck in `Waiting form VM live-migration..`