CI: Improve e2e tests reliability #343

ldoktor · 2024-01-30T15:28:20Z

There are several improvements to the e2e pipeline to avoid odd returns as well as to interrupt on hangs. Last but not least I added extra messages that should improve reading the logs.

I'm definitely open to suggestions to remove/improve individual commits....

portersrc

looks great

portersrc · 2024-01-30T19:28:55Z

tests/e2e/run-local.sh

 		undo_changes
 	fi
+	[ "$RET" -ne 0 ] && echo && echo "::error:: Testing failed with $RET" || echo "::info:: Testing passed"


When undo is true and RET is nonzero, we'll get a couple "testing failed" error messages (one before undoing changes, and one after). I guess that's OK, though this could be tweaked if you want.

That was my intention, you get notified things went wrong (as it might not be obvious from the logs) and that things are going to be cleared, then you get a bunch of output spilled over finishing with ansible message that everything went well after which you get this second Testing failed message to emphasize although Ansible is happy the testing actually did not went well.

we are seeing stalled jobs, let's ensure our steps won't take longer than expected. Signed-off-by: Lukáš Doktor <[email protected]>

ensure each test won't take longer than expected treshold by setting a 10m timeout. Note one can only set one timeout for all tests within a single file. Signed-off-by: Lukáš Doktor <[email protected]>

recent issues in CI indicate that kubectl might sometimes fail which results in wait_for_process interrupting the loop. Let's improve the command to ensure kubectl command passed and only then grep for the (un)expected output. Note the positive commands do not need to be treated as the output should not contain the pod names on failure. Fixes: confidential-containers#339 Signed-off-by: Lukáš Doktor <[email protected]>

On test failure we might still execute a cleanup that spils a bunch of text making it not obvious whether the testing passed or failed. Note the return code is already fine, this change is only for the users to better notice things didn't went well. Signed-off-by: Lukáš Doktor <[email protected]>

Replace the simple "DEBUG|ERROR|INFO" prefixes with the github action commands "::debug::" as it should improve the GH logs readability while leaving the bash outputs still parsable by humans. Signed-off-by: Lukáš Doktor <[email protected]>

ldoktor · 2024-01-31T16:05:21Z

Changes:

Rebased
Amended commit message heading to use "tests.e2e" prefix rather than "ci"

Hopefully no other changes, tried hard not to introduce new "spaces" rather than "tabs" :-)

stevenhorsman

LGTM. Thanks Lukas!

wainersm · 2024-02-02T20:24:27Z

.github/workflows/ccruntime_e2e.yaml

@@ -62,7 +62,7 @@ jobs:
          if [ $RUNNING_INSTANCE = "s390x" ]; then
            args=""
          fi
-          ./run-local.sh -r "${{ matrix.runtimeclass }}" "${args}"
+          ./run-local.sh -t -r "${{ matrix.runtimeclass }}" "${args}"


Yeah, it makes sense to have the timeout in the script because we don't map this steps to github job's steps, otherwise the timeouts could be set on the github workflows. This scenario might change when we address #309 .

wainersm · 2024-02-02T20:26:45Z

tests/e2e/run-local.sh

+run() {
+	duration=$1; shift
+	if [ "$timeout" == "true" ]; then
+		timeout $duration "$@"


What if it prints a friendly message (e.g. "Run timed out after XX") when it timed out? i.e. when $? -eq 124 ?

That can open a can of worms as the script itself can return 124 so we'd have to add a logic to get the actual time (I mean we could use the $SECONDS so it's not that extensive but still) and then report "Run probably timed out after XXXs" when the timeout seems correct. Do you want me to add it or are we going to rely on log timestamps only?

Hmm... we better leave as is. If we start seeing too many timeouts and it proves to be confusing then we may change it.

wainersm · 2024-02-02T20:32:48Z

tests/e2e/operator.sh

-	#
-	local cmd="! sudo -E kubectl get pods -n $op_ns |"
-	cmd+="grep -q -e cc-operator-daemon-install"
+	# (ensure failing kubectl keeps iterating)


Good catch!

wainersm · 2024-02-02T20:39:33Z

Hi @ldoktor ! I left one suggestion that you might accept. Everything else looks good.

wainersm

Thanks @ldoktor

ldoktor added the ok-to-test label Jan 30, 2024

ldoktor force-pushed the reliability branch 2 times, most recently from 554b7d5 to d84dc6d Compare January 30, 2024 18:26

ldoktor requested a review from wainersm January 30, 2024 19:17

portersrc approved these changes Jan 30, 2024

View reviewed changes

ldoktor added 5 commits January 31, 2024 17:03

tests.e2e: Add timeouts to individual steps

b72f8f1

we are seeing stalled jobs, let's ensure our steps won't take longer than expected. Signed-off-by: Lukáš Doktor <[email protected]>

tests.e2e: Add 10m timeout for each bats test

db80d13

ensure each test won't take longer than expected treshold by setting a 10m timeout. Note one can only set one timeout for all tests within a single file. Signed-off-by: Lukáš Doktor <[email protected]>

ldoktor force-pushed the reliability branch from d84dc6d to 2d78020 Compare January 31, 2024 16:04

stevenhorsman approved these changes Jan 31, 2024

View reviewed changes

wainersm reviewed Feb 2, 2024

View reviewed changes

wainersm approved these changes Feb 6, 2024

View reviewed changes

wainersm merged commit ed63a3d into confidential-containers:main Feb 6, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: Improve e2e tests reliability #343

CI: Improve e2e tests reliability #343

ldoktor commented Jan 30, 2024

portersrc left a comment

portersrc Jan 30, 2024

ldoktor Jan 31, 2024

ldoktor commented Jan 31, 2024

stevenhorsman left a comment

wainersm Feb 2, 2024

wainersm Feb 2, 2024

ldoktor Feb 5, 2024

wainersm Feb 6, 2024

wainersm Feb 2, 2024

wainersm commented Feb 2, 2024

wainersm left a comment

CI: Improve e2e tests reliability #343

CI: Improve e2e tests reliability #343

Conversation

ldoktor commented Jan 30, 2024

portersrc left a comment

Choose a reason for hiding this comment

portersrc Jan 30, 2024

Choose a reason for hiding this comment

ldoktor Jan 31, 2024

Choose a reason for hiding this comment

ldoktor commented Jan 31, 2024

stevenhorsman left a comment

Choose a reason for hiding this comment

wainersm Feb 2, 2024

Choose a reason for hiding this comment

wainersm Feb 2, 2024

Choose a reason for hiding this comment

ldoktor Feb 5, 2024

Choose a reason for hiding this comment

wainersm Feb 6, 2024

Choose a reason for hiding this comment

wainersm Feb 2, 2024

Choose a reason for hiding this comment

wainersm commented Feb 2, 2024

wainersm left a comment

Choose a reason for hiding this comment