Skip to content
This repository has been archived by the owner on Jul 16, 2020. It is now read-only.

travis: high percentage of test failures #434

Closed
tpepper opened this issue Aug 3, 2016 · 6 comments
Closed

travis: high percentage of test failures #434

tpepper opened this issue Aug 3, 2016 · 6 comments
Assignees
Labels

Comments

@tpepper
Copy link

tpepper commented Aug 3, 2016

We're having a high percentage of test failure in travis. Maybe as high as one in four in my experience. There is no rhyme or reason to the individual unit test failure and the test has a good chance of passing if the job is re-run.

I suspect we have a race somewhere...

@tpepper
Copy link
Author

tpepper commented Aug 4, 2016

Could relate to #83 (races), an implicit change in the travis side that pulls in something broken (eg: #420), and we have also re-vendored a few times lately our go dependencies. We need to work to isolate the issue.

Please copy/paste details of any travis failures into this issue. We need data to start finding a correlation toward resolution.

@tpepper tpepper self-assigned this Aug 4, 2016
@tpepper
Copy link
Author

tpepper commented Aug 4, 2016

https://travis-ci.org/tpepper/ciao/jobs/149830418 looks like a deadlock, and is in:

testutil TestReconnects

I've hit this before when a go routine has leaked, is blocked on one of the test results channels, and suddenly gets a fresh consumer when the server restarts.

@tpepper
Copy link
Author

tpepper commented Aug 4, 2016

https://travis-ci.org/01org/ciao/jobs/149835683 looks like a deadlock, and is in:

ciao-scheduler TestReconnects

I've hit this before when a go routine has leaked, is blocked on one of the test results channels, and suddenly gets a fresh consumer when the server restarts.

@tpepper
Copy link
Author

tpepper commented Aug 4, 2016

https://travis-ci.org/01org/ciao/jobs/149835681 is a straight failure, and is in:

ciao-controller TestTenantOutOfBounds

and outputs:

controller_test.go:220: Not tracking limits correctly

@tpepper
Copy link
Author

tpepper commented Aug 5, 2016

PR #437 should address the deadlocks on the testutil channels. And it enables -race in non-controller tests to try to capture more detail information on what's breaking where.

@tpepper
Copy link
Author

tpepper commented Aug 16, 2016

I'm closing this ticket as we've found a number of small bugs and have also chosen to disable the --race detector in our travis runs. That should put us on a more stable footing as far as travis is concerned, but it does leave genuine issues un-resolved.

Currently there is at least one real race which @markdryan sees in the launcher output from the travis failurs.

We've got a separate ticket #235 to enable go --race detector, which I'll leave open so we revisit this again and try to get there eventually.

@tpepper tpepper closed this as completed Aug 16, 2016
markdryan pushed a commit to markdryan/ciao that referenced this issue Aug 17, 2016
The code in instance_test.go that waited for the instance loop to close
down was incorrect.  There was a possibility of deadlock if the instance
loop was sending some stats down the overseer channel at the same time
as the test was trying to shut down the instance.  There is similar code
in the overseer which actually shuts down the instance loop correctly.
This commit simply ports the good overseer code over to instance_test.go.

Partial fix for ciao-project#434

Signed-off-by: Mark Ryan <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant