Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of docs: clarify reschedule, migrate, and replacement terminology into release/1.9.x #25145

Conversation

hc-github-team-nomad-core
Copy link
Contributor

Backport

This PR is auto-generated from #24929 to be assessed for backporting due to the inclusion of the label backport/1.9.x.

The below text is copied from the body of the original PR.


Our vocabulary around scheduler behaviors outside of the reschedule and migrate blocks leaves room for confusion around whether the reschedule tracker should be propagated between allocations. There are effectively five different behaviors we need to cover:

  • restart: when the tasks of an allocation fail and we try to restart the tasks in place.

  • reschedule: when the restart block runs out of attempts (or the allocation fails before tasks even start), and we need to move the allocation to another node to try again.

  • migrate: when the user has asked to drain a node and we need to move the allocations. These are not failures, so we don't want to propagate the reschedule tracker.

  • replacement: when a node is lost, we don't count that against the reschedule tracker for the allocations on the node (it's not the allocation's "fault", after all). We don't want to run the migrate machinery here here either, as we can't contact the down node. To the scheduler, this is effectively the same as if we bumped the group.count

  • replacement for disconnect.replace = true: this is a replacement, but the replacement is intended to be temporary, so we propagate the reschedule tracker.

Add a section to the reschedule, migrate, and disconnect blocks explaining when each item applies. Update the use of the word "reschedule" in several places where "replacement" is correct, and vice-versa.

Fixes: #24918


major preview links:


Overview of commits

@tgross tgross merged commit 39c7d3d into release/1.9.x Feb 18, 2025
26 checks passed
@tgross tgross deleted the backport/docs-replacement-vs-reschedule/utterly-inspired-ox branch February 18, 2025 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants