Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] azure runners are flaky — remove them? #4686

Closed
orbeckst opened this issue Aug 24, 2024 · 9 comments
Closed

[CI] azure runners are flaky — remove them? #4686

orbeckst opened this issue Aug 24, 2024 · 9 comments

Comments

@orbeckst
Copy link
Member

Even when all the GH actions runners succeed, many/all of the azure runners fail in some way. I have seen failures related to RDKIT and to timeouts with multiprocessing.

Do we know why the Azure runners appear to fail most of the time somehow?

At the moment we seem to have decided to ignore them and purely rely on GH ones so unless we figure out why the Azure ones are failing and work on fixing these issues, we might as well disable them and not bother because right now they don't seem to fulfill the purpose of guiding decisions on PR review.

@IAlibay
Copy link
Member

IAlibay commented Aug 24, 2024

The recent Azure pipeline failures are very recent and generally Azure is less flaky than GH actions.

As far as I can tell the rdkit failures are an actual issue that need to be addressed, it's a Windows (or pypi) thing not an azure thing.

At the moment we seem to have decided to ignore them and purely rely on GH ones so unless we figure out why the Azure ones are failing and work

I'm not aware of any such policy. This seems like a brand new thing? If Azure is failing so much that folks are doing this, why hasn't it been reported until now? It feels a bit much that we've jumped straight to "let's remove it", rather than "we'll fix it".

@IAlibay
Copy link
Member

IAlibay commented Aug 24, 2024

timeouts with multiprocessing

This isn't unique to azure pipelines, it happens with gh actions too

@orbeckst
Copy link
Member Author

I'm not aware of any such policy. This seems like a brand new thing?

Every single PR that I have recently glanced over was ultimately merged while Azure was red. Basically, if it fails with something that is clearly not related to the content of the PR then we have been merging. The GH runners for Linux and macOS were green and that was "good enough" — "someone" probably should have raised issues for Windows things but at least I haven't so far as it wasn't clear to me that these were Windows specific issues.

On that note, would it make sense to dump Azure and just use GitHub for Windows, too, as this would simplify our CI?

timeouts with multiprocessing

This isn't unique to azure pipelines, it happens with gh actions too

This is true. It would be very nice if PR #4584 improved this situation – I think you wanted to offer some input there?

@orbeckst
Copy link
Member Author

I opened #4687 for the RDKit test failure.

@orbeckst
Copy link
Member Author

I don't think I fully understand what the rationale or strategy is for all our CI, i.e., at the big picture level what we decided was the priority to cover and how we decided to implement this strategy.

I started https://github.com/MDAnalysis/mdanalysis/wiki/CI-strategy with some notes but I am sure that I am missing important details — everyone is welcome to edit.

@IAlibay
Copy link
Member

IAlibay commented Aug 25, 2024

This is true. It would be very nice if PR #4584 improved this situation – I think you wanted to offer some input there?

This is indeed on my to-do list, but it's down prioritised currently, I'm happy to revise this as necessary.

@IAlibay
Copy link
Member

IAlibay commented Aug 25, 2024

I don't think I fully understand what the rationale or strategy is for all our CI, i.e., at the big picture level what we decided was the priority to cover and how we decided to implement this strategy.

I started https://github.com/MDAnalysis/mdanalysis/wiki/CI-strategy with some notes but I am sure that I am missing important details — everyone is welcome to edit.

This is another to-do item that I just haven't had the time to deal with, happy to have a call at some point, but I won't have time to prioritise this any time soon unfortunately.

@IAlibay
Copy link
Member

IAlibay commented Aug 25, 2024

On that note, would it make sense to dump Azure and just use GitHub for Windows, too, as this would simplify our CI?

No, dumping Azure means adding work to move what those pipelines do (which is not the same as what our GH actions pipelines do, even if you ignore the OS differences) to GH action, whilst retaining the exact same issues. Long term that might be a strategy, but if we're looking to reduce our workload then this is not the answer.

The provider isn't the issue, it's what we are covering.

@IAlibay
Copy link
Member

IAlibay commented Aug 25, 2024

"someone" probably should have raised issues for Windows things but at least I haven't so far as it wasn't clear to me that these were Windows specific issues.

This seems to be the crux of the issue here, if folks don't want to communicate issues then we can't prioritise and/or deal with them.

@MDAnalysis MDAnalysis locked and limited conversation to collaborators Aug 26, 2024
@orbeckst orbeckst converted this issue into discussion #4689 Aug 26, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

2 participants