Skip to content

Commit

Permalink
Add how-to for force rebooting all nodes in a machine config pool
Browse files Browse the repository at this point in the history
  • Loading branch information
simu committed Nov 2, 2023
1 parent 36658d9 commit bfc51f9
Show file tree
Hide file tree
Showing 2 changed files with 78 additions and 0 deletions.
77 changes: 77 additions & 0 deletions docs/modules/ROOT/pages/how-tos/force-reboot.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
= Force reboot of all nodes in a machine config pool

== Starting situation

* You have admin-level access to the OpenShift 4 cluster
* You want to trigger node reboots for a whole machine config pool

== Prerequisites

The following CLI utilities need to be available

* `kubectl`
* `oc` (The commands assume you have v4.13 or newer)
* `jq`
* `grep`

== Reboot nodes

. Select machine config pool for which you want to reboot all nodes
+
[source,bash]
----
MCP=<name> <1>
----
<1> Replace with the name of the machine config pool for which you want to reboot the nodes

. List all nodes belonging to the pool
+
[source,bash]
----
node_selector=$( \
kubectl get mcp "${MCP}" -ojsonpath='{.spec.nodeSelector.matchLabels}' | \
jq -r '. as $root | [. | keys[] | "\(.)=\($root[.])"] | join(",")' \
)
kubectl get nodes -l $node_selector
----

. Prepare the nodes for a force machine config resync
+
[source,bash]
----
for node in $(kubectl get nodes -oname -l $node_selector); do
oc --as=cluster-admin debug $node -- chroot /host touch /run/machine-config-daemon-force
done
----

. Select an old rendered machine config for the pool
+
[TIP]
====
The command selects the second newest rendered machine config.
The exact value doesn't matter, but we want to overwrite the `currentConfig` annotation with an existing machine config, so that the operator doesn't mark the nodes as degraded.
====
+
[source,bash]
----
old_mc=$(kubectl get mc -o json | \
jq --arg mcp rendered-$MCP -r \
'[.items[] | select(.metadata.name | contains($mcp))]
| sort_by(.metadata.creationTimestamp) | reverse
| .[1] | .metadata.name' \
)
----

. Trigger machine config daemon resync for *one node at a time*
+
[IMPORTANT]
====
Don't do this for multiple nodes at the same time, all the nodes for which this step is executed are immediately drained and rebooted.
====
+
[source,bash]
----
kubectl annotate --overwrite node \
<nodename> machineconfiguration.openshift.io/currentConfig=$old_mc --overwrite <1>
----
<1> Replace `<nodename>` with one of the nodes that you still need to reboot
1 change: 1 addition & 0 deletions docs/modules/ROOT/partials/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,7 @@
* Day two operations
** xref:oc4:ROOT:how-tos/maintenance_troubleshooting.adoc[Maintenance troubleshooting]
** xref:oc4:ROOT:how-tos/debug-nodes.adoc[Debugging Nodes]
** xref:oc4:ROOT:how-tos/force-reboot.adoc[]
** Runbooks
*** xref:oc4:ROOT:how-tos/monitoring/runbooks/maintenance_alerts.adoc[MaintenanceAlertFiring]
Expand Down

0 comments on commit bfc51f9

Please sign in to comment.