Readiness probe succeeds even if the "active" label can't be updated #142

faroukbi · 2024-03-13T16:30:38Z

While operating a Solace cluster provisioned via Helm, we faced a situation where the readiness probe wasn't capable of updating the "active" label of messaging pods, nevertheless the script readiness_check.sh reported 0 as return code and the pods continued being seen as read. As a consequence, the service forwarded traffic to the inactive node, which then rejected connections.

Steps to reproduce the issue:

Provision a Solace cluster. The primary node should have label "active" set to "true" and the backup node should have the label "active" set to "false"
Remove the created rolebinding so that the used service account doesn't have permission to call the pod patch API
Execute a failover from the primary to the backup

Noticed behavior

The primary node continues being ready and has the label "active" set to true even if it is inactive.
The backup node continues being ready and has the label "active" set to false even if it is active.
The service continues forwarding traffic to the primary node, which is inactive and then rejects connections.

Expected behavior

Both pods should be marked as not ready, as the readiness probe can't call the pod patch API. The script readiness_check.sh should return a different return code than 0.

Probable cause

Following 2 calls of the curl commands return code 0, even if the Kubernetes API returns HTTP 403.

solaceConfigMap.yaml

        if ! curl -sS --output /dev/null --cacert $CACERT --connect-timeout 5 \
            --request PATCH --data "$(cat /tmp/patch_label.json)" \
            -H "Authorization: Bearer $KUBE_TOKEN" -H "Content-Type:application/json-patch+json" \
            $K8S/api/v1/namespaces/$NAMESPACE/pods/$HOSTNAME ; then
          # Label update didn't work this way, fall back to alternative legacy method to update label
          if ! curl -sSk --output /dev/null -H "Authorization: Bearer $KUBE_TOKEN" --request PATCH --data "$(cat /tmp/patch_label.json)" \
            -H "Content-Type:application/json-patch+json" \
            https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_PORT_443_TCP_PORT/api/v1/namespaces/$STATEFULSET_NAMESPACE/pods/$HOSTNAME ; then
            echo "`date` ERROR: ${APP}-Unable to update pod label, check access from pod to K8s API or RBAC authorization" >&2
            rm -f ${FINAL_ACTIVITY_LOGGED_TRACKING_FILE}; exit 1
          fi
        fi

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readiness probe succeeds even if the "active" label can't be updated #142

Readiness probe succeeds even if the "active" label can't be updated #142

faroukbi commented Mar 13, 2024 •

edited

Loading

Readiness probe succeeds even if the "active" label can't be updated #142

Readiness probe succeeds even if the "active" label can't be updated #142

Comments

faroukbi commented Mar 13, 2024 • edited Loading

faroukbi commented Mar 13, 2024 •

edited

Loading