-
Notifications
You must be signed in to change notification settings - Fork 653
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] azure runners are flaky — remove them? #4686
Comments
The recent Azure pipeline failures are very recent and generally Azure is less flaky than GH actions. As far as I can tell the rdkit failures are an actual issue that need to be addressed, it's a Windows (or pypi) thing not an azure thing.
I'm not aware of any such policy. This seems like a brand new thing? If Azure is failing so much that folks are doing this, why hasn't it been reported until now? It feels a bit much that we've jumped straight to "let's remove it", rather than "we'll fix it". |
This isn't unique to azure pipelines, it happens with gh actions too |
Every single PR that I have recently glanced over was ultimately merged while Azure was red. Basically, if it fails with something that is clearly not related to the content of the PR then we have been merging. The GH runners for Linux and macOS were green and that was "good enough" — "someone" probably should have raised issues for Windows things but at least I haven't so far as it wasn't clear to me that these were Windows specific issues. On that note, would it make sense to dump Azure and just use GitHub for Windows, too, as this would simplify our CI?
This is true. It would be very nice if PR #4584 improved this situation – I think you wanted to offer some input there? |
I opened #4687 for the RDKit test failure. |
I don't think I fully understand what the rationale or strategy is for all our CI, i.e., at the big picture level what we decided was the priority to cover and how we decided to implement this strategy. I started https://github.com/MDAnalysis/mdanalysis/wiki/CI-strategy with some notes but I am sure that I am missing important details — everyone is welcome to edit. |
This is indeed on my to-do list, but it's down prioritised currently, I'm happy to revise this as necessary. |
This is another to-do item that I just haven't had the time to deal with, happy to have a call at some point, but I won't have time to prioritise this any time soon unfortunately. |
No, dumping Azure means adding work to move what those pipelines do (which is not the same as what our GH actions pipelines do, even if you ignore the OS differences) to GH action, whilst retaining the exact same issues. Long term that might be a strategy, but if we're looking to reduce our workload then this is not the answer. The provider isn't the issue, it's what we are covering. |
This seems to be the crux of the issue here, if folks don't want to communicate issues then we can't prioritise and/or deal with them. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Even when all the GH actions runners succeed, many/all of the azure runners fail in some way. I have seen failures related to RDKIT and to timeouts with multiprocessing.
Do we know why the Azure runners appear to fail most of the time somehow?
At the moment we seem to have decided to ignore them and purely rely on GH ones so unless we figure out why the Azure ones are failing and work on fixing these issues, we might as well disable them and not bother because right now they don't seem to fulfill the purpose of guiding decisions on PR review.
The text was updated successfully, but these errors were encountered: