Consider making `tobira worker` deal with loss of DB connection #732

LukasKalbertodt · 2023-03-07T08:56:19Z

In #201 I decided to let our worker command just fail & exit when anything happens. As it should run as a service anyway, configuring it to restart automatically again should be easy. However, the standard restart behavior of systemd for example is not particularly helpful. It just restarts Tobira for a max of 10 times or whatever before stopping. And since usually the DB doesn't get back for some time, all those restarts fail too.

I think we can improve the out of box situation here.

One simple idea would be to have all processes always establish a new DB connection before they get active (i.e. to have no long lived DB connections). That would mean that every sync attempt or whatever would fail, but we would make sure not to bring down the whole process. But I don't think this is optimal as it would result in lots of log spam since these processes would try every 3s (search index) or 15s (sync) or something like that.

Instead, I think I would catch DB connection errors at the top level. If that happens, I would try to reestablish the connection with some exponential backoff or something. Further, there we could also find out of the DB is in a special state (e.g. hot standby) and try again until we have a DB connection in a state the works for worker.

The text was updated successfully, but these errors were encountered:

LukasKalbertodt added area:backend Everything backend related kind:improvement area:database The Tobira database area:sync Syncing with an Opencast instance labels Mar 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider making `tobira worker` deal with loss of DB connection #732

Consider making `tobira worker` deal with loss of DB connection #732

LukasKalbertodt commented Mar 7, 2023

Consider making tobira worker deal with loss of DB connection #732

Consider making tobira worker deal with loss of DB connection #732

Comments

LukasKalbertodt commented Mar 7, 2023

Consider making `tobira worker` deal with loss of DB connection #732

Consider making `tobira worker` deal with loss of DB connection #732