Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readiness probe succeeds even if the "active" label can't be updated #142

Open
faroukbi opened this issue Mar 13, 2024 · 0 comments
Open

Comments

@faroukbi
Copy link

faroukbi commented Mar 13, 2024

While operating a Solace cluster provisioned via Helm, we faced a situation where the readiness probe wasn't capable of updating the "active" label of messaging pods, nevertheless the script readiness_check.sh reported 0 as return code and the pods continued being seen as read. As a consequence, the service forwarded traffic to the inactive node, which then rejected connections.

Steps to reproduce the issue:

  1. Provision a Solace cluster. The primary node should have label "active" set to "true" and the backup node should have the label "active" set to "false"
  2. Remove the created rolebinding so that the used service account doesn't have permission to call the pod patch API
  3. Execute a failover from the primary to the backup

Noticed behavior

The primary node continues being ready and has the label "active" set to true even if it is inactive.
The backup node continues being ready and has the label "active" set to false even if it is active.
The service continues forwarding traffic to the primary node, which is inactive and then rejects connections.

Expected behavior

Both pods should be marked as not ready, as the readiness probe can't call the pod patch API. The script readiness_check.sh should return a different return code than 0.

Probable cause

Following 2 calls of the curl commands return code 0, even if the Kubernetes API returns HTTP 403.

solaceConfigMap.yaml

        if ! curl -sS --output /dev/null --cacert $CACERT --connect-timeout 5 \
            --request PATCH --data "$(cat /tmp/patch_label.json)" \
            -H "Authorization: Bearer $KUBE_TOKEN" -H "Content-Type:application/json-patch+json" \
            $K8S/api/v1/namespaces/$NAMESPACE/pods/$HOSTNAME ; then
          # Label update didn't work this way, fall back to alternative legacy method to update label
          if ! curl -sSk --output /dev/null -H "Authorization: Bearer $KUBE_TOKEN" --request PATCH --data "$(cat /tmp/patch_label.json)" \
            -H "Content-Type:application/json-patch+json" \
            https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_PORT_443_TCP_PORT/api/v1/namespaces/$STATEFULSET_NAMESPACE/pods/$HOSTNAME ; then
            echo "`date` ERROR: ${APP}-Unable to update pod label, check access from pod to K8s API or RBAC authorization" >&2
            rm -f ${FINAL_ACTIVITY_LOGGED_TRACKING_FILE}; exit 1
          fi
        fi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant