Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #800 - Invalid Command Response on Database Outage #1009

12 changes: 11 additions & 1 deletion server/lobbyconnection.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

import aiohttp
from sqlalchemy import and_, func, select
from sqlalchemy.exc import DBAPIError
from sqlalchemy.exc import DBAPIError, OperationalError

import server.metrics as metrics
from server.db import FAFDatabase
Expand Down Expand Up @@ -216,6 +216,16 @@ async def on_message_received(self, message):
self.get_user_identifier(),
cmd
)
except OperationalError:
# When the database goes down, SqlAlchemy will throw an OperationalError
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we still want to log the exception here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can go ahead and add logging here. I was avoiding it only because if the DB is truly down, it could cause a lot of logs to the logging service to be ingested due to failed commands. But if we feel like the logging service can handle that volume without issue, I have no qualms about adding it in.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good question. I'm not sure it's every happened before, but I think it's safer to have it log something even if it ends up spamming a lot of messages than to have it error silently and require users to start complaining before we can see that there's an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too easy. I've added that logging in the most recent version.

self._logger.error("Encountered OperationalError on message received. This could indicate DB is down.")
await self.send({
"command": "notice",
"style": "error",
"text": "Unable to connect to database. Please try again later."
})
# Make sure to abort here to avoid a thundering herd problem.
Comment on lines +224 to +227
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to use the kick style to prevent the client from trying to automatically reconnect. @Sheikah45?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that's what https://github.com/FAForever/server/pull/1009/files#diff-de94292bdcec0d8f98b8eb76040f97830866bed0c1fb5b55165a9e54f175ea0bR227 would do for the client, but if that is not the case, we for sure need to send the client a "Stop and don't retry" message.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, there's currently no good system for telling the client that it should not attempt to reconnect. I believe the kick style message is all we have. It might be cleaner to add a dedicated message for that but that would be for another PR and also require client changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think https://github.com/FAForever/downlords-faf-client/blob/ca8554d8767e748498dee37caa39cf3f657284a6/src/main/java/com/faforever/client/remote/FafServerAccessor.java#L115 could be an issue since the logic is if we were connected and suddenly were disconnected, retry the login and connect. Net new connections should be okay since it will be going from connecting to disconnected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main problem with the kick style is that it will also close the client. If this is desired behavior then sure but otherwise it shouldn't be kick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sheikah45 @Askaholic I think this should work for what we want then. This follows the same pattern as if someone were banned where it sends a message and aborts the connection.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, we'll leave it like this then.

await self.abort("Error connecting to database")
except Exception as e: # pragma: no cover
await self.send({"command": "invalid"})
self._logger.exception(e)
Expand Down
23 changes: 23 additions & 0 deletions tests/unit_tests/test_lobbyconnection.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import pytest
from aiohttp import web
from sqlalchemy import and_, select
from sqlalchemy.exc import OperationalError

from server.config import config
from server.db.models import ban, friends_and_foes
Expand Down Expand Up @@ -159,6 +160,28 @@ async def test_bad_command_calls_abort(lobbyconnection):
lobbyconnection.abort.assert_called_once_with("Error processing command")


async def test_database_outage_error_responds_cleanly(lobbyconnection):
lobbyconnection.abort = mock.AsyncMock()
lobbyconnection.check_policy_conformity = mock.AsyncMock(return_value=True)
lobbyconnection.send = mock.AsyncMock()

def mock_ensure_authenticated(cmd):
raise OperationalError(statement="", params=[], orig=None)
lobbyconnection.ensure_authenticated = mock_ensure_authenticated
await lobbyconnection.on_message_received({
"command": "hello",
"login": "test",
"password": sha256(b"test_password").hexdigest(),
"unique_id": "blah"
})
lobbyconnection.send.assert_called_once_with({
"command": "notice",
"style": "error",
"text": "Unable to connect to database. Please try again later."
})
lobbyconnection.abort.assert_called_once_with("Error connecting to database")


async def test_command_pong_does_nothing(lobbyconnection):
lobbyconnection.send = mock.AsyncMock()

Expand Down
Loading