You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[FAIL] ClusterAddonReconciler [AfterEach] Basic test applies the helm chart
/home/runner/work/cluster-stack-operator/cluster-stack-operator/internal/test/integration/workloadcluster/cluster_addon_test.go:109
After pressing "re-run test" in the Github web UI, the test was fine.
This means that the test is flaky since it fails and succeeds without any code change.
#!/bin/bashtrap 'echo"ERROR: A command has failed. Exiting the script. Line was ($0:$LINENO): $(sed -n "${LINENO}p" "$0")"; exit3' ERRset-Eeuopipefaili=0whilemaketest-unit; doi=$((i+1))
echo"Test run $i"datesleep1echo "=========================================================================================
echo "=========================================================================================
echodone
Works fine, the second run failed:
Summarizing 1 Failure:
[FAIL] ClusterStackReconciler Test with provider Tests with multiple versions [It] checks ProviderClusterstackrelease is deleted when version is removed from spec
/home/guettli/syself/cluster-stack-operator-public/internal/controller/clusterstack_controller_test.go:616
Ran 40 of 40 Specs in 10.203 seconds
FAIL! -- 39 Passed | 1 Failed | 0 Pending | 0 Skipped
From time to time this fails, too:
Summarizing 1 Failure:
[FAIL] ClusterStackReconciler Test with provider Basic test [It] creates the cluster stack release object with cluster stack auto subscribe false
/home/guettli/syself/cluster-stack-operator-public/internal/controller/clusterstack_controller_test.go:487
Ran 40 of 40 Specs in 8.498 seconds
FAIL! -- 39 Passed | 1 Failed | 0 Pending | 0 Skipped
sometimes this fails:
Summarizing 1 Failure:
[FAIL] ClusterAddonReconciler Basic test [It] creates the clusterAddon object
/home/guettli/syself/cluster-stack-operator-public/internal/controller/clusteraddon_controller_test.go:112
Ran 40 of 40 Specs in 9.054 seconds
FAIL! -- 39 Passed | 1 Failed | 0 Pending | 0 Skipped
How to make the test always fail
If the test waits until the clusterStackRelease v2 is created, then the test always fails.
/////////////////////////////////////////////////////////////////////// Flaky testFIt("checks ProviderClusterstackrelease is deleted when version is removed from spec", func() {
fmt.Println("itttttttttttttttttttttttttttttttttttttttttt")
ph, err:=patch.NewHelper(clusterStack, testEnv)
Expect(err).ShouldNot(HaveOccurred())
// newproviderclusterStackReleaseRefV2:=&corev1.ObjectReference{
APIVersion: "infrastructure.clusterstack.x-k8s.io/v1alpha1",
Kind: "TestInfrastructureProviderClusterStackRelease",
Name: clusterStackReleaseTagV2,
Namespace: testNs.Name,
}
Eventually(func() bool {
obj, err:=external.Get(ctx, testEnv.GetClient(), providerclusterStackReleaseRefV2, testNs.Name)
iferr!=nil {
fmt.Printf("foundProviderclusterStackReleaseRef is not found. %s\n", err.Error())
returnfalse
}
fmt.Printf("foundProviderclusterStackReleaseRef is found. %s %+v %+v\n",
obj.GetName(), obj.GetFinalizers(), obj.GetOwnerReferences())
returntrue
}, timeout, interval).Should(BeTrue())
Instead of the Eventually block, you can use time.Sleep(time.Second * 1) it has the same effect. This will fail here:
@janiskemper what do you think: Does it make sense to wait in BeforeEach, so that the tests get a stable environment?
The good news: The test fails reproducible with FIt() this means. The flakiness is inside this single test (because no other test gets called).
Current conclusion
The test (It("checks ProviderClusterstackrelease is deleted when version is removed from spec")) has always been wrong. In some edge cases (roughly every 8th run), it failed because the correct sequence was executed.
The deletionTimestamp does not get propagated from the parent-object to the child-object, because in envTest the GC is not running.
I asked at #controller-runtime how other developers handle it.
I will update the code, so that I check for the DeletionTimestamp of the parent-object, and then check the ownerRef at the child-object. I will remove the test that the child-object gets deleted.
The text was updated successfully, but these errors were encountered:
/kind bug
In CI a test failed:
https://github.com/SovereignCloudStack/cluster-stack-operator/actions/runs/10717474962/job/29717241629?pr=255
After pressing "re-run test" in the Github web UI, the test was fine.
This means that the test is flaky since it fails and succeeds without any code change.
Here is the successful run: https://github.com/SovereignCloudStack/cluster-stack-operator/actions/runs/10717474962?pr=255
What did you expect to happen:
I expect tests to not be flaky.
Fixing flakyness
I used that script to run tests again and again:
Works fine, the second run failed:
From time to time this fails, too:
sometimes this fails:
How to make the test always fail
If the test waits until the clusterStackRelease v2 is created, then the test always fails.
Instead of the
Eventually
block, you can usetime.Sleep(time.Second * 1)
it has the same effect. This will fail here:I think the real error is in BeforeEach(). It does not wait until the resources get created.
@janiskemper what do you think: Does it make sense to wait in BeforeEach, so that the tests get a stable environment?
The good news: The test fails reproducible with
FIt()
this means. The flakiness is inside this single test (because no other test gets called).Current conclusion
The test (
It("checks ProviderClusterstackrelease is deleted when version is removed from spec")
) has always been wrong. In some edge cases (roughly every 8th run), it failed because the correct sequence was executed.The deletionTimestamp does not get propagated from the parent-object to the child-object, because in envTest the GC is not running.
Related kubebuilder docs: https://book.kubebuilder.io/reference/envtest.html#testing-considerations
I asked at #controller-runtime how other developers handle it.
I will update the code, so that I check for the DeletionTimestamp of the parent-object, and then check the ownerRef at the child-object. I will remove the test that the child-object gets deleted.
The text was updated successfully, but these errors were encountered: