Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deterministic end-to-end test failures. #113

Open
JasonMoho opened this issue Sep 15, 2022 · 0 comments
Open

Non-deterministic end-to-end test failures. #113

JasonMoho opened this issue Sep 15, 2022 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@JasonMoho
Copy link
Collaborator

Describe the bug
The end to end tests for training and evaluation occasionally fail or timeout, especially when running on Github actions. It's difficult to reproduce this behavior locally. The failures seem to occur most on tests which use async processing + the buffer. This leads me to believe that there is a concurrency control bug (e.g. deadlock) occurring.

The workaround for this bug is to just re-run the tests.

To Reproduce
Occasionally can reproduce when running GitHub Actions workflow. E.g. https://github.com/marius-team/marius/actions/runs/3056399004/jobs/4930521831

I have not observed async processing bugs when running on large-scale datasets, only on the tiny-scale datasets used for testing.

The main challenge will be isolating and identifying the issue. My approach will be to run a highly asynchronous configuration on a small dataset, which will hopefully recreate the conditions needed for the concurrency bug to arise.

Environment
Occurs on both Linux and MacOS

@JasonMoho JasonMoho added the bug Something isn't working label Sep 15, 2022
@JasonMoho JasonMoho self-assigned this Sep 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant