You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The end to end tests for training and evaluation occasionally fail or timeout, especially when running on Github actions. It's difficult to reproduce this behavior locally. The failures seem to occur most on tests which use async processing + the buffer. This leads me to believe that there is a concurrency control bug (e.g. deadlock) occurring.
The workaround for this bug is to just re-run the tests.
I have not observed async processing bugs when running on large-scale datasets, only on the tiny-scale datasets used for testing.
The main challenge will be isolating and identifying the issue. My approach will be to run a highly asynchronous configuration on a small dataset, which will hopefully recreate the conditions needed for the concurrency bug to arise.
Environment
Occurs on both Linux and MacOS
The text was updated successfully, but these errors were encountered:
Describe the bug
The end to end tests for training and evaluation occasionally fail or timeout, especially when running on Github actions. It's difficult to reproduce this behavior locally. The failures seem to occur most on tests which use async processing + the buffer. This leads me to believe that there is a concurrency control bug (e.g. deadlock) occurring.
The workaround for this bug is to just re-run the tests.
To Reproduce
Occasionally can reproduce when running GitHub Actions workflow. E.g. https://github.com/marius-team/marius/actions/runs/3056399004/jobs/4930521831
I have not observed async processing bugs when running on large-scale datasets, only on the tiny-scale datasets used for testing.
The main challenge will be isolating and identifying the issue. My approach will be to run a highly asynchronous configuration on a small dataset, which will hopefully recreate the conditions needed for the concurrency bug to arise.
Environment
Occurs on both Linux and MacOS
The text was updated successfully, but these errors were encountered: