Skip to content

Commit

Permalink
Engine healthcheck: deal with empty uuid file
Browse files Browse the repository at this point in the history
In rare cases (believed to be caused by a non-atomic file creation and
writing operation in containerd), we end up with an empty file at
/mnt/data/docker/containerd/daemon/io.containerd.grpc.v1.introspection/uuid.
This causes `ctr version` (and hence the health check) to fail. See
balena-os/balena-engine#322

This PR addresses this issue in two ways:

1. Before running `ctr version`, we check if the uuid file exists and is
   empty. If so, we remove it. (The subsequent execution of `ctr
   version` by the healthcheck will create the file again.)
2. After running `ctr version`, we check if the uuid file was really
   created and is not empty.

In both cases, when an empty uuid file is detected, we log the event and
create a "breadcrumb" file at `/mnt/data/engine-healthcheck/` to help us
confirm our hypothesis about the root cause.

Signed-off-by: Leandro Motta Barros <[email protected]>
Change-type: patch
  • Loading branch information
lmbarros committed Dec 15, 2022
1 parent 66694d0 commit 4293061
Showing 1 changed file with 21 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,26 @@ CONTAINERD_SOCKET="/run/balena-engine/containerd/balena-engine-containerd.sock"
# Check if balena-engine-daemon is responding.
curl --fail --unix-socket $BALENAD_SOCKET http:/v1.40/_ping > /dev/null 2>&1

# Due to a non-atomic file creation and writing operation in containerd, we
# sometimes end up with an empty `uuid` file. This causes `ctr version` (and
# hence the health check) to fail. We therefore remove this file if it is
# present and empty. See https://github.com/balena-os/balena-engine/issues/322
BREADCRUMBS_DIR="/mnt/data/engine-healthcheck"
UUID_FILE="/mnt/data/docker/containerd/daemon/io.containerd.grpc.v1.introspection/uuid"
if [ -f "$UUID_FILE" -a ! -s "$UUID_FILE" ]; then
echo "healthcheck: removing empty $UUID_FILE"
rm -f "$UUID_FILE"
mkdir -p "$BREADCRUMBS_DIR"
date > "$BREADCRUMBS_DIR/empty-uuid-before"
fi

# Check if balena-engine-containerd is responding.
balena-engine-containerd-ctr --address $CONTAINERD_SOCKET version > /dev/null 2>&1

# The uuid file is expected to exist and be non-empty after `ctr version`. If
# this is not the case, log and record the event.
if [ -f "$UUID_FILE" -a ! -s "$UUID_FILE" ]; then
echo "healthcheck: $UUID_FILE empty after 'ctr version'"
mkdir -p "$BREADCRUMBS_DIR"
date > "$BREADCRUMBS_DIR/empty-uuid-after"
fi

0 comments on commit 4293061

Please sign in to comment.