-
Notifications
You must be signed in to change notification settings - Fork 51
travis: high percentage of test failures #434
Comments
Could relate to #83 (races), an implicit change in the travis side that pulls in something broken (eg: #420), and we have also re-vendored a few times lately our go dependencies. We need to work to isolate the issue. Please copy/paste details of any travis failures into this issue. We need data to start finding a correlation toward resolution. |
https://travis-ci.org/tpepper/ciao/jobs/149830418 looks like a deadlock, and is in: testutil TestReconnects I've hit this before when a go routine has leaked, is blocked on one of the test results channels, and suddenly gets a fresh consumer when the server restarts. |
https://travis-ci.org/01org/ciao/jobs/149835683 looks like a deadlock, and is in: ciao-scheduler TestReconnects I've hit this before when a go routine has leaked, is blocked on one of the test results channels, and suddenly gets a fresh consumer when the server restarts. |
https://travis-ci.org/01org/ciao/jobs/149835681 is a straight failure, and is in: ciao-controller TestTenantOutOfBounds and outputs:
|
PR #437 should address the deadlocks on the testutil channels. And it enables -race in non-controller tests to try to capture more detail information on what's breaking where. |
I'm closing this ticket as we've found a number of small bugs and have also chosen to disable the --race detector in our travis runs. That should put us on a more stable footing as far as travis is concerned, but it does leave genuine issues un-resolved. Currently there is at least one real race which @markdryan sees in the launcher output from the travis failurs. We've got a separate ticket #235 to enable go --race detector, which I'll leave open so we revisit this again and try to get there eventually. |
The code in instance_test.go that waited for the instance loop to close down was incorrect. There was a possibility of deadlock if the instance loop was sending some stats down the overseer channel at the same time as the test was trying to shut down the instance. There is similar code in the overseer which actually shuts down the instance loop correctly. This commit simply ports the good overseer code over to instance_test.go. Partial fix for ciao-project#434 Signed-off-by: Mark Ryan <[email protected]>
We're having a high percentage of test failure in travis. Maybe as high as one in four in my experience. There is no rhyme or reason to the individual unit test failure and the test has a good chance of passing if the job is re-run.
I suspect we have a race somewhere...
The text was updated successfully, but these errors were encountered: