Backport of docs: clarify reschedule, migrate, and replacement terminology into release/1.8.x #25144

hc-github-team-nomad-core · 2025-02-18T14:31:53Z

Backport

This PR is auto-generated from #24929 to be assessed for backporting due to the inclusion of the label backport/1.8.x.

🚨

Warning automatic cherry-pick of commits failed. If the first commit failed,
you will see a blank no-op commit below. If at least one commit succeeded, you
will see the cherry-picked commits up to, not including, the commit where
the merge conflict occurred.

The person who merged in the original PR is:
@tgross
This person should manually cherry-pick the original PR into a new backport PR,
and close this one when the manual backport PR is merged in.

merge conflict error: POST https://api.github.com/repos/hashicorp/nomad/merges: 409 Merge conflict []

The below text is copied from the body of the original PR.

Our vocabulary around scheduler behaviors outside of the reschedule and migrate blocks leaves room for confusion around whether the reschedule tracker should be propagated between allocations. There are effectively five different behaviors we need to cover:

restart: when the tasks of an allocation fail and we try to restart the tasks in place.
reschedule: when the restart block runs out of attempts (or the allocation fails before tasks even start), and we need to move the allocation to another node to try again.
migrate: when the user has asked to drain a node and we need to move the allocations. These are not failures, so we don't want to propagate the reschedule tracker.
replacement: when a node is lost, we don't count that against the reschedule tracker for the allocations on the node (it's not the allocation's "fault", after all). We don't want to run the migrate machinery here here either, as we can't contact the down node. To the scheduler, this is effectively the same as if we bumped the group.count
replacement for disconnect.replace = true: this is a replacement, but the replacement is intended to be temporary, so we propagate the reschedule tracker.

Add a section to the reschedule, migrate, and disconnect blocks explaining when each item applies. Update the use of the word "reschedule" in several places where "replacement" is correct, and vice-versa.

Fixes: #24918

major preview links:

Overview of commits

dc58f24

hashicorp-cla-app · 2025-02-18T14:32:08Z

All committers have signed the CLA.

hashicorp-cla-app · 2025-02-18T14:32:09Z

Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement

Learn more about why HashiCorp requires a CLA and what the CLA includes

temp seems not to be a GitHub user.
You need a GitHub account to be able to sign the CLA.
If you have already a GitHub account, please add the email address used for this commit to your account.

_{Have you signed the CLA already but the status is still pending? Recheck it.}

Our vocabulary around scheduler behaviors outside of the `reschedule` and `migrate` blocks leaves room for confusion around whether the reschedule tracker should be propagated between allocations. There are effectively five different behaviors we need to cover: * restart: when the tasks of an allocation fail and we try to restart the tasks in place. * reschedule: when the `restart` block runs out of attempts (or the allocation fails before tasks even start), and we need to move the allocation to another node to try again. * migrate: when the user has asked to drain a node and we need to move the allocations. These are not failures, so we don't want to propagate the reschedule tracker. * replacement: when a node is lost, we don't count that against the `reschedule` tracker for the allocations on the node (it's not the allocation's "fault", after all). We don't want to run the `migrate` machinery here here either, as we can't contact the down node. To the scheduler, this is effectively the same as if we bumped the `group.count` * replacement for `disconnect.replace = true`: this is a replacement, but the replacement is intended to be temporary, so we propagate the reschedule tracker. Add a section to the `reschedule`, `migrate`, and `disconnect` blocks explaining when each item applies. Update the use of the word "reschedule" in several places where "replacement" is correct, and vice-versa. Fixes: #24918 Co-authored-by: Aimee Ukasick <[email protected]>

hc-github-team-nomad-core assigned tgross Feb 18, 2025

hc-github-team-nomad-core requested a review from tgross February 18, 2025 14:31

vercel bot deployed to Preview – nomad-ui February 18, 2025 14:35 View deployment

tgross force-pushed the backport/docs-replacement-vs-reschedule/loosely-prompt-jay branch from 020f697 to 9db9e62 Compare February 18, 2025 14:58

tgross approved these changes Feb 18, 2025

View reviewed changes

vercel bot deployed to Preview – nomad-ui February 18, 2025 14:59 View deployment

tgross marked this pull request as ready for review February 18, 2025 14:59

vercel bot deployed to Preview – nomad February 18, 2025 15:04 View deployment

tgross requested review from pkazmierczak and jrasell February 18, 2025 15:23

pkazmierczak approved these changes Feb 18, 2025

View reviewed changes

tgross merged commit 9f6a2f6 into release/1.8.x Feb 18, 2025
27 checks passed

tgross deleted the backport/docs-replacement-vs-reschedule/loosely-prompt-jay branch February 18, 2025 15:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport of docs: clarify reschedule, migrate, and replacement terminology into release/1.8.x #25144

Backport of docs: clarify reschedule, migrate, and replacement terminology into release/1.8.x #25144

hc-github-team-nomad-core commented Feb 18, 2025

hashicorp-cla-app bot commented Feb 18, 2025 •

edited

Loading

hashicorp-cla-app bot commented Feb 18, 2025

Backport of docs: clarify reschedule, migrate, and replacement terminology into release/1.8.x #25144

Backport of docs: clarify reschedule, migrate, and replacement terminology into release/1.8.x #25144

Conversation

hc-github-team-nomad-core commented Feb 18, 2025

Backport

hashicorp-cla-app bot commented Feb 18, 2025 • edited Loading

hashicorp-cla-app bot commented Feb 18, 2025

hashicorp-cla-app bot commented Feb 18, 2025 •

edited

Loading