Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add part1 manual test case from v1.3.0 validated tickets. #1217

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
title: Restore the snapshot when the vm already been restored from backup
---

* Related issues: [#4604](https://github.com/harvester/harvester/issues/4604) [BUG] Restore from snapshot not work if target VM is restore-replaced from backup


## Category:
* Backup and Restore

## Verification Steps
1. Config backup-target to NFS backup target
1. Create VM, wait VM to Running state
1. Take Backup, wait backup Ready
1. Take VM Snapshot, wait snapshot Ready
1. Stop VM, wait VM Off
1. Restore (Replace) from backup, wait VM Running
1. Stop VM again
1. Restore from snapshot, replace the current VM
1. Check the vm is running
1. Restore from snapshot, create a new VM
1. Check the new vm is created and running
khushboo-rancher marked this conversation as resolved.
Show resolved Hide resolved

## Expected Results
* Can restore the snapshot to replace original vm when the vm have been restored from backup


* Can restore the snapshot to create the new vm when the vm when the vm have been restored from backup
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
title: VM backup restore use Longhorn v1.5+ attach/detach mechanism
---

* Related issues: [#4907](https://github.com/harvester/harvester/issues/4907) [ENHANCEMENT] Take advantage of the attach/detach mechanism in Longhorn v1.5+

## Category:
* Backup and Restore

## Verification Steps
Case 1: snapshot can work on a stopped VM.
1. Create a VM.
1. After the VM is ready, stop the VM.
1. Check VM volumes are detached.
1. Take a snapshot on the VM. The snapshot can be ready.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have this in automated test already?

Copy link
Contributor Author

@TachunLin TachunLin Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through our existing backend integration test scenarios of:

  1. test_1_volumes.py
  2. test_4_vm_backup_restore.py
  3. test_4_vm_snapshot.py

I found the following test plan in test_4_vm_snapshot.py may covered in the test_create_vm_snapshot_while_pvc_detached

def test_create_vm_snapshot_while_pvc_detached(self, api_client,
vm_snapshot_2_name, source_vm, wait_timeout):
"""
Test that a VM snapshot can be created when the source
PVC is detached.
Prerequisites:
The original VM (`source-vm`) exists and is stopped (so that
the PVC is detached.)
"""
name, _ = source_vm
stop_vm(name, api_client, wait_timeout)
code, _ = api_client.vm_snapshots.create(name, vm_snapshot_2_name)
assert 201 == code
deadline = datetime.now() + timedelta(seconds=wait_timeout)
while deadline > datetime.now():
code, data = api_client.vm_snapshots.get(vm_snapshot_2_name)
if data.get("status", {}).get("readyToUse"):
break
print(f"waiting for {vm_snapshot_2_name} to be ready")
sleep(3)
else:
raise AssertionError(f"timed out waiting for {vm_snapshot_2_name} to be ready")
code, data = api_client.vm_snapshots.get(vm_snapshot_2_name)
assert 200 == code
assert data.get("status", {}).get("readyToUse") is True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the test plan, remove the lines that have been covered in the automation e2e test.
And replace to indicate which e2e test covered this line.


Case 2: restore a snapshot from detached volumes can work.
1. Follow Case 1.
1. Make sure VM volumes are detached.
1. Restore the snapshot to a new VM. The new VM can be ready.
1. Restore the snapshot to replace the old VM. The old VM can be ready.

Case 3: backup can work on a stopped VM
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, can you check if we have this in automated test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Case 2:
I found the following test plan in test_4_vm_snapshot.py integration test may covered in the test test_restore_from_vm_snapshot_while_pvc_detached_from_source

def test_restore_from_vm_snapshot_while_pvc_detached_from_source(self,
api_client,
restored_vm_2,
host_shell,
vm_shell,
ssh_keypair,
wait_timeout):
"""
Test that a new virtual machine can be created from a
VM snapshot created from a source PersistentVolumeClaim
that is now detached.
Prerequisites:
The original VM (`source-vm`) exists and is stopped (so that
the PVC is detached.)
The original snapshot (`vm-snapshot`) exists.
"""
name, ssh_user = restored_vm_2
def actassert(sh):
out, _ = sh.exec_command("cat test.txt")
assert "123" in out
vm_shell_do(name, api_client,
host_shell, vm_shell,
ssh_user, ssh_keypair,
actassert, wait_timeout)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the test plan, remove the lines that have been covered in the automation e2e test.
And replace to indicate which e2e test covered this line.

1. Create a VM.
1. After the VM is ready, stop the VM.
1. Check VM volumes are detached.
1. Take a backup on the VM. The backup can be ready.

Case 4: race condition doesn't break VMBackup
1. Create a VM.
1. After the VM is ready, stop the VM.
1. Check VM volumes are detached.
1. Take multiple backup on the VM in a short time. All backup can be ready.

Case 5: restore a backup from detached volumes can work:
1. Follow Case 3.
1. Make sure VM volumes are detached.
1. Restore the backup to a new VM. The new VM can be ready.
1. Restore the backup to replace the old VM. The old VM can be ready.

## Expected Results
* Case 1: snapshot can work on a stopped VM
- VM volumes are detached after we stop VM.
```
node1:~ # kubectl get volume -A
NAMESPACE NAME STATE ROBUSTNESS SCHEDULED SIZE NODE AGE
longhorn-system pvc-5a861225-920d-4059-b501-f02b2fd0ff27 detached unknown 10737418240 19m
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decrease an indent.

Current

    node1:~ # kubec
    NAMESPACE
    ...

Should be

node1:~ # kubec
NAMESPACE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the check.
Decrease the indent to align the format.

- Take the vm snapshot, The snapshot of vm in off state can be ready.
* Case 2: restore a snapshot from detached volumes can work.
- Restore the snapshot to a new VM. The new VM can be ready.
- Restore the snapshot to replace the old VM. The old VM can be ready.

* Case 3: backup can work on a stopped VM.
- VM volumes are detached after we stop VM.
```
NAMESPACE NAME STATE ROBUSTNESS SCHEDULED SIZE NODE AGE
longhorn-system pvc-d1226d97-ab90-4d40-92f9-960b668093c2 detached unknown 10737418240 5m12s
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the check.
Decrease the indent to align the format.

- Take the vm backup, The backup of vm in off state can be ready.

* Case 4: race condition doesn't break VMBackup
- Take multiple backup on the VM in a short time. All backup can be ready.

* Case 5: restore a backup from detached volumes can work
- Restore the backup to a new VM. The new VM can be ready.
- Restore the backup to replace the old VM. (Retain Volume), The old VM can be ready.
- Restore the backup to replace the old VM. (Delete Volume), The old VM can be ready.
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
title: VM have snapshot can't be restored to new VM
---

* Related issues: [#4954](https://github.com/harvester/harvester/issues/4954) [BUG] VM Restore to new VM doesn't work if there is VM Snapshot


## Category:
* Backup and Restore

## Verification Steps
1. Prepare a 3 nodes Harvester cluster
1. Create a vm named vm1
1. Setup nfs backup target
1. Create a backup for the vm1
1. Create a snapshot for the vm1
1. Restore the backup of vm1 to create a new VM
1. Check can restore vm correctly
1. Shutdown vm1
1. Restore the backup of vm to replace the existing vm
1. Select Retain volume
1. Check can restore vm correctly
Comment on lines +11 to +22
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check if this is covered in automated test already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By checking our existing vm backup and snapshot integration test scenarios in:

  1. test_4_vm_backup_restore.py
  2. test_4_vm_snapshot.py

I did not found a suitable automation test could both cover when vm already have the snapshot and backup on it.

Among these tests

test_4_vm_backup_restore.py

  1. test_connection
  2. tests_backup_vm
  3. test_update_backup_by_yaml
  4. test_restore_with_new_vm
  5. test_restore_replace_with_delete_vols
  6. test_restore_replace_vm_not_stop
  7. test_restore_with_invalid_name
  8. test_backup_migrated_vmtest_restore_replace_migrated_vm
  9. test_backup_multiple
  10. test_delete_last_backup
  11. test_delete_middle_backup

test_4_vm_snapshot.py

  1. test_vm_snapshot_create
  2. test_restore_into_new_vm_from_vm_snapshot
  3. test_replace_is_rejected_when_deletepolicy_is_retain
  4. test_replace_vm_with_vm_snapshot
  5. test_restore_from_vm_snapshot_while_pvc_detached_from_source
  6. test_create_vm_snapshot_while_pvc_detached
  7. test_vm_snapshots_are_cleaned_up_after_source_vm_deleted
  8. test_volume_snapshots_are_cleaned_up_after_source_volume_deleted


## Expected Results
* Can restore vm backup to create a new vm when already setup vm backup and snapshot on it.

* Can restore vm backup to replace existing vm (retain volume) when already setup vm backup and snapshot on it.
84 changes: 84 additions & 0 deletions docs/content/manual/deployment/4480-config-sftp-dynamically.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
title: Config sftp dynamically
---

* Related issues: [#4480](https://github.com/harvester/harvester/issues/4480) [ENHANCEMENT] config sftp dynamically


## Category:
* Deployment

## Verification Steps
1. Use ipxe-example to provision Harvester
1. Add the following sshd config to the config-create.yaml and config-join.yaml of Harvester configuration file
```
os:
sshd:
sftp: true
```
1. Provision the 3 nodes Harvester cluster
1. After provisioning or upgrade complete, ssh to the management and worker node
1. Check the file exists
```
sudo tail -n5 /etc/ssh/sshd_config
cat /etc/ssh/sshd_config.d/sftp.conf
```
1. Use sftp to connect to Harvester, check the connection works
```
sftp [email protected]
```

## Expected Results
1. Fresh install: on create node and compute node machine, we can sftp connect to the node and find the corresponding file
```
harvester@localhost:~/Harvester/ipxe-examples/vagrant-pxe-harvester> ssh [email protected]
Password:
Last login: Wed Jan 10 05:42:44 2024 from 192.168.0.1
rancher@harvester-node-0:~> pwd
/home/rancher
rancher@harvester-node-0:~> vim test.txt
rancher@harvester-node-0:~> exit
logout
Connection to 192.168.0.131 closed.
harvester@localhost:~/Harvester/ipxe-examples/vagrant-pxe-harvester> sftp [email protected]
Password:
Connected to 192.168.0.131.
sftp> ls -al
drwxr-xr-x 3 rancher rancher 4096 Jan 10 05:48 .
drwxr-xr-x 3 root root 4096 Jan 10 03:56 ..
-rw------- 1 rancher rancher 92 Jan 10 05:48 .bash_history
drwx------ 2 rancher rancher 4096 Jan 10 03:56 .ssh
-rw-r----- 1 rancher rancher 5 Jan 10 05:48 test.txt
sftp> get test.txt test.txt
Fetching /home/rancher/test.txt to test.txt
/home/rancher/test.txt 100% 5 5.3KB/s 00:00
```
1. After reboot machine, the sftp service and corresponding file exists
```
harvester@localhost:~/Harvester/ipxe-examples/vagrant-pxe-harvester> ssh [email protected]
Password:
rancher@harvester-node-2:~> sudo tail -n5 /etc/ssh/sshd_config
AllowAgentForwarding no
X11Forwarding no
AllowTcpForwarding no
MaxAuthTries 3
Include /etc/ssh/sshd_config.d/*.conf
rancher@harvester-node-2:~> exit
logout
Connection to 192.168.0.32 closed.
harvester@localhost:~/Harvester/ipxe-examples/vagrant-pxe-harvester> sftp [email protected]
Password:
Connected to 192.168.0.32.
```
1. After upgrade, the sftp service and corresponding file exists
```
rancher@harvester-node-0:~> cat /etc/ssh/sshd_config.d/sftp.conf
Subsystem sftp /usr/lib/ssh/sftp-server
rancher@harvester-node-0:~> exit
logout
Connection to 192.168.0.131 closed.
davidtclin@localhost:~/Documents/Project_Repo/ipxe-examples/vagrant-pxe-harvester> sftp [email protected]
Password:
Connected to 192.168.0.131.
sftp>
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
---
title: Create load balancer on different roles pools in RKE2 cluster
---

* Related issues: [#4678](https://github.com/harvester/harvester/issues/4678) [BUG] Unable to successfully create LoadBalancer service from Guest cluster post 1.2.1 upgrade



## Category:
* Rancher

## Verification Steps
1. Prepare three nodes Harvester and import with Rancher
1. Provision a RKE2 guest cluster in Rancher
1. Create the first pool, specify the `etcd` and `control plane` roles
1. Create the second pool, specify the `work` role only
Comment on lines +16 to +17
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albin, do we have a test case with 2 pool created while testing guest cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After checking and helped by Albin, currently in the backend e2e test in test_9_rancher_integration.py
We provide creating 3 nodes in the same roles ALL and in the same pool.

Thus it seems the manual test is somehow differ with the backend e2e Rancher integration test.

1. Add the following cloud config in the `User Data` of each pool
```
write_files:
- encoding: b64
content: {harvester's kube config, the cluster namespace should be same as the pool you created (base64 encoded)}
owner: root:root
path: /etc/kubernetes/cloud-config
permission: '0644'
```
1. Get the Harvester kubeconfig file, remember to add the namespace (should be the same with the guest cluster)
```
contexts:
- name: "local"
context:
user: "local"
cluster: "local"
namespace: "default" ---------------------> Add this line
```
1. Output the Harvester kubeconfig into base64 format without new line
```
cat local.yaml | base64 -w 0
```
1. Copy the base64 encoded kubeconfig to the cloud config write file sections above
{{< image "images/rancher/4678-base64-kubeconfig.png" >}}
1. Provision the RKE2 guest cluster
1. After pools are created, we remove the harvester-cloud-provider in Apps > Installed Apps (kube-system namespace).
1. Add new charts in Apps > Repositories, use https://charts.harvesterhci.io to install and select 0.2.3.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check a particular version of chart, removing the default and installing from Apps is fine but in general, this should not be practiced. As the bundled chart might get rolled back to the manifest shipped with guest cluster. So, I think adding this as generalized test case is not required.

Copy link
Contributor Author

@TachunLin TachunLin Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion.

That's true, the tester should confirm the cloud provider default provided in the RKE2 guest cluster at least to have 0.2.3 to include this changes.

These steps here are not necessary since these steps are just for the time when the bundled chart is not ready during the issue verification stage.

We can remove these lines and the related static screenshot.
Then add the check step to ensure the desired cloud provider version

{{< image "images/rancher/4678-add-harvester-repo.png" >}}
1. Install Harvester cloud provider 0.2.3 from market
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Install Harvester cloud provider 0.2.3 from Apps & marketplace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the check.
Update to the correct term in this line.

{{< image "images/rancher/4678-install-cloud-provider.png" >}}
1. Create a nginx deployment using Harvester storage class
1. Create a load balancer service with dhcp type and bind to nginx deployment
1. Check can install the load balancer
1. Create a standalone load balancer service, set the following annotation
1. Access to the guest cluster VM, check the load balancer service can get `EXTERNAL_IP`

1. Prepare three nodes Harvester and import with Rancher
1. Provision a RKE2 guest cluster in Rancher
1. Create the first pool, specif the `control plane` role only
1. Create the second pool, specify the `etcd` role only
1. Create the third pool, specify the `worker` role only
1. Repeat the steps 13 - 17 to create load balancer service

1. Prepare three nodes Harvester and import with Rancher
1. Provision a RKE2 guest cluster in Rancher
1. Create the only one pool, specif the control plane, etcd and workers roles
Comment on lines +31 to +33
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already covered in automation test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After checking and confirmed with team, our existing backend e2e test of rancher integration in test_9_rancher_integration.py:

There we provide the ability to create 3 nodes guest cluster

pytest.param(3, marks=pytest.mark.skip(reason="Skip for low I/O env."))])

In actual testing helped by Albin, the 3 nodes downstream cluster are created in the same pool and all set to the ALL role.
image

Thus we can assume this manual test plan to have different roles in different pools did not covered by e2e test.



## Expected Results
* Control-plane, ETCD and worker in same pool:
- Can successfully create LoadBalancer service on RKE2 guest cluster
- All load balance type service have assigned the EXTERNAL_IP


* Control-plane and ETCD in A pool, worker in B pool:
- Can successfully create LoadBalancer service on RKE2 guest cluster
- All load balance type service have assigned the EXTERNAL_IP

* Control-plane in A pool, ETCD in B pool and worker in C pool:
- Can successfully create LoadBalancer service on RKE2 guest cluster
- All load balance type service have assigned the EXTERNAL_IP
63 changes: 63 additions & 0 deletions docs/content/manual/upgrade/4461-check-space-before-upgrade.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
title: Check Free disk space percent before upgrade
---

* Related issues: [#4611](https://github.com/harvester/harvester/issues/4611) [ENHANCEMENT] Check free disk space percent before upgrade


## Category:
* Upgrade Harvester

## Verification Steps
1. Create a new Harvester cluster. Each node's disk space should be 250G.
1. Use dd to write files to make sure one of the node's free disk space is 20G, ff every node's /usr/local free space is more than 30G
```
dd if=/dev/zero of=/usr/local/test.img bs=1G count=93 .
```
1. Ensure /usr/local did not have enough spaces
```
rancher@n1-240104:~> df -h /usr/local
Filesystem Size Used Avail Use% Mounted on
/dev/vda5 147G 28G 112G 21% /usr/local
rancher@n1-240104:~> sudo dd if=/dev/zero of=/usr/local/test.img bs=1G count=93
93+0 records in
93+0 records out
99857989632 bytes (100 GB, 93 GiB) copied, 327.656 s, 305 MB/s
rancher@n1-240104:~> df -h /usr/local
Filesystem Size Used Avail Use% Mounted on
/dev/vda5 147G 116G 24G 84% /usr/local
```
1. Create a test version
```
apiVersion: harvesterhci.io/v1beta1
kind: Version
metadata:
annotations:
harvesterhci.io/minFreeDiskSpaceGB: "5"
name: 1.2.2
namespace: harvester-system
spec:
isoURL: http://192.168.0.181:8000/harvester-v1.2.2-amd64.iso
minUpgradableVersion: 1.2.1
releaseDate: "20231210"
tags:
- dev
- test
```
1. Click upgrade on the dashboard

1. Remove the dd file
1. Ensure we free the /usr/local spaces
```
rancher@n1-240104:/usr/local> sudo rm test.img
rancher@n1-240104:/usr/local> df -h /usr/local
Filesystem Size Used Avail Use% Mounted on
/dev/vda5 147G 23G 118G 16% /usr/local
```
1. Click the upgrade again

## Expected Results
* Should get an error message on the upgrade dashboard
{{< image "images/upgrade/4461-upgrade-space-check.png" >}}

* There is no error message on the upgrade dashboard, can trigger the upgrade process
22 changes: 22 additions & 0 deletions docs/content/manual/upgrade/4725-upgrade-shutdown-vm-in-os.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
title: Upgrade with VM shutdown in the operating system
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is being taken care in upgrade automation test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understating, our existing upgrade backend e2e test will shutdown VM before starting upgrade.
This manual test have a little bit difference with e2e test since in this test we shudown VM by accessing into the operating system while the e2e test shutdown VM from the virtual machine related API call.

1. Login to Windows VM and shutdown machine from menu
1. Login to openSUSE VM , use command to shutdown machine

Thus I am thinking whether we can keep this test plan since there operations are somehow different.

---

* Related issues: [#4725](https://github.com/harvester/harvester/issues/4725) [BUG] Upgrade 1.2.0 -> 1.2.1 is stuck in “Waiting for VM live-migration or shutdown...(1 left)” even though there is NO VM running


## Category:
* Virtual Machines

## Verification Steps
1. Prepare 3 nodes v1.2.1 Harvester cluster
1. Create a Windows server 2022 VM
1. Create a openSUSE leap 15.4 VM
1. Open web console of each VM
1. Login to Windows VM and shutdown machine from menu
1. Login to openSUSE VM , use command to shutdown machine
1. Upgrade Harvester to v1.3.0-rc1
1. Check the upgrade process

## Expected Results
* We can successfully upgrade to from v1.2.1 to v1.3.0-rc1 and remain the OS state, did not encounter stuck in `Waiting form VM live-migration..`
Loading