Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resubmission to faulty hardware #217

Open
bosterholz opened this issue Aug 9, 2022 · 0 comments
Open

Resubmission to faulty hardware #217

bosterholz opened this issue Aug 9, 2022 · 0 comments

Comments

@bosterholz
Copy link
Collaborator

In some edge cases fine jobs failed because of faulty hardware.
Because of their instant re-submission chances are high, that they are retried on the same faulty instance.
It would be nice to ban instances for certain jobs if they failed there in the past.

Another easier fix would be a retry "sleep" time, in hope that the faulty instance is already booked after the timeout and jobs are scheduled to new instances.

See: Nextflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant