Skip to content

Commit

Permalink
Engine healthcheck: deal with empty uuid file
Browse files Browse the repository at this point in the history
In rare cases (believed to be caused by a non-atomic file creation and
writing operation in containerd), we end up with an empty file at
`/mnt/data/docker/containerd/daemon/io.containerd.grpc.v1.introspection/uuid`.
This causes `ctr version` (and hence the health check) to fail. See
balena-os/balena-engine#322

This commit addresses this issue in two ways:

1. Before running `ctr version`, we check if the uuid file exists and is
   empty. If so, we remove it. (The subsequent execution of `ctr
   version` by the healthcheck will create the file again.)
2. After running `ctr version`, we check if the uuid file was really
   created and is not empty.

In both cases, when an empty uuid file is detected, we log the event to
help us confirm our hypothesis about the root cause.

Signed-off-by: Leandro Motta Barros <[email protected]>
Change-type: patch
  • Loading branch information
lmbarros committed Dec 19, 2022
1 parent 749dd79 commit 57f0336
Showing 1 changed file with 16 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,21 @@ CONTAINERD_SOCKET="/run/balena-engine/containerd/balena-engine-containerd.sock"
# Check if balena-engine-daemon is responding.
curl --fail --unix-socket $BALENAD_SOCKET http:/v1.40/_ping > /dev/null 2>&1

# Due to a non-atomic file creation and writing operation in containerd, we
# sometimes end up with an empty `uuid` file. This causes `ctr version` (and
# hence the health check) to fail. We therefore remove this file if it is
# present and empty. See https://github.com/balena-os/balena-engine/issues/322
UUID_FILE="/mnt/data/docker/containerd/daemon/io.containerd.grpc.v1.introspection/uuid"
if [ -f "$UUID_FILE" -a ! -s "$UUID_FILE" ]; then
echo "healthcheck: removing empty $UUID_FILE"
rm -f "$UUID_FILE"
fi

# Check if balena-engine-containerd is responding.
balena-engine-containerd-ctr --address $CONTAINERD_SOCKET version > /dev/null 2>&1

# The uuid file is expected to exist and be non-empty after `ctr version`. If
# this is not the case, log and record the event.
if [ -f "$UUID_FILE" -a ! -s "$UUID_FILE" ]; then
echo "healthcheck: $UUID_FILE empty after 'ctr version'"
fi

0 comments on commit 57f0336

Please sign in to comment.