Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider making tobira worker deal with loss of DB connection #732

Open
LukasKalbertodt opened this issue Mar 7, 2023 · 0 comments
Open
Labels
area:backend Everything backend related area:database The Tobira database area:sync Syncing with an Opencast instance kind:improvement

Comments

@LukasKalbertodt
Copy link
Member

In #201 I decided to let our worker command just fail & exit when anything happens. As it should run as a service anyway, configuring it to restart automatically again should be easy. However, the standard restart behavior of systemd for example is not particularly helpful. It just restarts Tobira for a max of 10 times or whatever before stopping. And since usually the DB doesn't get back for some time, all those restarts fail too.

I think we can improve the out of box situation here.

One simple idea would be to have all processes always establish a new DB connection before they get active (i.e. to have no long lived DB connections). That would mean that every sync attempt or whatever would fail, but we would make sure not to bring down the whole process. But I don't think this is optimal as it would result in lots of log spam since these processes would try every 3s (search index) or 15s (sync) or something like that.

Instead, I think I would catch DB connection errors at the top level. If that happens, I would try to reestablish the connection with some exponential backoff or something. Further, there we could also find out of the DB is in a special state (e.g. hot standby) and try again until we have a DB connection in a state the works for worker.

@LukasKalbertodt LukasKalbertodt added area:backend Everything backend related kind:improvement area:database The Tobira database area:sync Syncing with an Opencast instance labels Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:backend Everything backend related area:database The Tobira database area:sync Syncing with an Opencast instance kind:improvement
Projects
None yet
Development

No branches or pull requests

1 participant