Add check and reconnect method to Database class #466

dale-wahl · 2024-11-19T12:34:11Z

This would allow the Database class to reconnect on SSL errors or recover if the connection was closed. If we run a bad query it will not recover from psycopg2.errors.InFailedSqlTransaction: current transaction is aborted, commands ignored until end of transaction block and require a rollback.

I tested it by shutting down the database, attempting to run some processes (which obviously failed), and then restarting the database. It seemed to recover just fine (well, any processors that failed would not restart until we restart the backend, but the frontend recovered fine).

dale-wahl · 2024-11-27T08:54:55Z

Hmmm. I thought about this some more last night and am not sure this is the best way forward.

Issue:
On a database disconnect both the backend and frontend crash and cannot recover automatically even if the database/connection comes back online. They must be restarted.

Dislike of this fix:
Creates another query (admittedly quick: SELECT 1;) for every db call. This was easy to implement, but it could be better.

Desire:
4CAT waits for a database connection to be re-established.

The backend manager currently dies. At minimum, if this was captured and waited, 4CAT would generally be fine. All currently running workers also crash including the API (the only always running worker) and there is no mechanism to restart them except via a restart (perhaps that could be changed).

The frontend cannot re-establish a connection. It does not actually crash, but even when the database is back up and could accept new connections, the frontend has no way of doing this. The error changes from unable to connect to a stale connection ad infinium and needs to be restarted.

Possible better solutions:
For the frontend, we could wrap to detect the stale connection and attempt reconnect. If it cannot, it will fail again, but will be able to recover whenever the database is reachable.

Backend... We could add to the manager to handle that disconnect. But maybe there is a better want to have any worker with a db disconnect wait. It could also check for self.interrupted to exit if needed.

Logging may need to be reviewed.

dale-wahl · 2024-12-03T11:58:23Z

Some things I learned:

psycopg2 is being maintained, but not updated. Psycopg 3 will be an update to psycopg.
in fancy new psycopg (3), connections will have a check_connection method
connection pools would be useful in our case, but do not in and of themselves solve the disconnect issue (as current implementation in psycopg2 does not check the connection before providing it from the pool).
connection.closed only updates after an attempt to connect fails
some places recommend checking connection via connection.isolation_level, but this seems to no long be the case (our version returns None always)
creating a cursor from a connection does not check the connection (though it will fail if the connection is already closed)

If we desire, we could use a connection pool and, instead of creating a new connection with every worker, share the Database class and then use a connection pool in, for example, the get_cursor function. Not sure how much benefit we would see from that; it may be more useful in the frontend as, if I understand correctly, a single worker can handle multiple requests at the same time and thus would be using a single connection.

…retry them after reconnect

dale-wahl · 2024-12-04T10:02:12Z

Refactored so instead of testing a query, it reconnects on any failed query. I tested disconnecting the database for both front and backend and believe I worked through any issues.

common/lib/database.py

add check and reconnect method to Database class

b8fb1a1

dale-wahl requested a review from stijn-uva November 19, 2024 12:34

dale-wahl mentioned this pull request Nov 19, 2024

Frontend crash due to SSL connection has been closed unexpectedly #465

Open

build in retries and wait time

39f48ef

dale-wahl removed the request for review from stijn-uva November 27, 2024 08:56

dale-wahl marked this pull request as draft November 27, 2024 08:56

reconnect function and instead of test query, use actual queries and …

ded189f

…retry them after reconnect

dale-wahl marked this pull request as ready for review December 4, 2024 10:02

stijn-uva reviewed Dec 11, 2024

View reviewed changes

common/lib/database.py Show resolved Hide resolved

warn with actual error and then attempt reconnect

01af9c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add check and reconnect method to Database class #466

Add check and reconnect method to Database class #466

dale-wahl commented Nov 19, 2024

dale-wahl commented Nov 27, 2024

dale-wahl commented Dec 3, 2024 •

edited

Loading

dale-wahl commented Dec 4, 2024

Add check and reconnect method to Database class #466

Are you sure you want to change the base?

Add check and reconnect method to Database class #466

Conversation

dale-wahl commented Nov 19, 2024

dale-wahl commented Nov 27, 2024

dale-wahl commented Dec 3, 2024 • edited Loading

dale-wahl commented Dec 4, 2024

dale-wahl commented Dec 3, 2024 •

edited

Loading