-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test flake in test_mem.sh #1569
Comments
This whole test takes two and a half minutes on Atrium:
So, the current timeout of an hour should be more than enough.
But, I believe that time is from when the job starts and includes waiting for the dependencies to finish. This waiting for dependency time is not part of the script specific timeout of 1 hour. |
If I look at the PS output that is gathered after the test hits the timeout, I don't see a crutest process. When I compare a passing test run to a failed timeout run, I notice that the passing tests has results from crutest output of the third loop, where we test
However, on the failing test, the crutest log shows the results from what I think is the second loop of the test, where we only test 10 GiB:
From this I suspect that in the timeout case, we never loop around to start that third loop.
Looking at the
The next thing the test will do is call
And, we never see the The test_mem.sh does not have the same checks that other tests do where we first verify all downstairs have started. Without that check, the other way to know you have hit 1498 is by not being able to shudown |
A few times in the past two weeks, the test_mem.sh test has not finished before hitting a timeout.
Here is an example of the last failure:
https://github.com/oxidecomputer/crucible/pull/1566/checks?check_run_id=33265882305
There were updates made to the github job in #1563 to reduce the timeout and to also gather more data in the event of a failure.
The text was updated successfully, but these errors were encountered: