Fix `SyncFailed` after `promote-to-primary` #397

eberlep · 2022-07-27T08:39:56Z

When switching from standby to primary, we first send an HTTP PATCH to the Patroni REST API to tell the databse to become a Leader, then we update the custom resource accordingly.

When updating the custom resource to replication leader, the operator immediately tries to update the ROLES in the databse. If the result of the Patroni REST call is not finished before that role update, the database is still read-only, and that update of the ROLES fails, hence the SyncFailed status.

This status will remain untill the next resync cycle, which happens every 30mins.

Brainstorming:

To fix the actual problem, we should introduce a small wait between the REST call and the update of the custom resource. And that wait would only be neccessary when the status actually changed (aka we performed a switch). So we need to identify if a switch was performed. Here is my idea: instead of always doing the PATCH request, we should perform a GET and check the actual Patroni status. If all is correct, continue as before. If, however, the status is not what it should be, perform the PATCH and reschedule the postgres reconcile. Here, we can decide how long to wait with the reconcile (via Result.RequeueAfter).

The only problem I see right now is when patroni is not responding, we wouldn't be updating the custom resource anymore but reconcile forever. A reconcile would also be performed on initial creation of the the databse as well. So maybe we should simply continue when the GET to patroni fails?

Update: To remedy the last problem, we defer the reconcile depending on when and where the update of the patroni config fails. that way, we can continue with updating the custom resource if desired and reconcile then (to try updating the patroni config again)

The text was updated successfully, but these errors were encountered:

eberlep linked a pull request Aug 31, 2022 that will close this issue

Check patronic config and only update if neccessary #423

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `SyncFailed` after `promote-to-primary` #397

Fix `SyncFailed` after `promote-to-primary` #397

eberlep commented Jul 27, 2022 •

edited

Loading

Fix SyncFailed after promote-to-primary #397

Fix SyncFailed after promote-to-primary #397

Comments

eberlep commented Jul 27, 2022 • edited Loading

Fix `SyncFailed` after `promote-to-primary` #397

Fix `SyncFailed` after `promote-to-primary` #397

eberlep commented Jul 27, 2022 •

edited

Loading