Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stability improvements for multiplayer room interop #273

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

smoogipoo
Copy link
Contributor

As mentioned in ppy/osu#31637 (comment)

  • Delaying adding users to rooms until after the exclusive lock has been acquired. This would prevent races caused by the single other user leaving the room during the whole process.
  • Removing users from rooms if adding fails. This could due to various conditions like the room having ended.

@@ -89,6 +87,8 @@ public async Task<MultiplayerRoom> JoinRoomWithPassword(long roomId, string pass
{
try
{
await sharedInterop.AddUserToRoomAsync(Context.GetUserId(), roomId, password);
Copy link
Member

@peppy peppy Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly concerned that this seems to be holding user and room locks over a web request. One of the more recent stability improvements to bancho saw it having a "way out" when database or external requests failed (and all LIO requests are done on a separate thread / queue).

It's fine to go ahead with this until it (potentially) becomes a problem. Just raising concern that this is creating a true cross-component dependency where both components (osu-web shared io octane and spectator-server) being alive to continue functioning. We do have shared IO in its own octane pool, so there's some degree of separation from main web (although that's only octane level, not reverse proxy / nginx).

And yeah, it's probably not the first case of this with the other recent changes being pushed, just bringing it up here because it crosses my mind.

Copy link
Contributor Author

@smoogipoo smoogipoo Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main concern I have is a situation where...

User 1 -> Creates room
User 2 -> Begins joining the room
User 1 -> Leaves room (closes room)
User 2 -> Added to channel via interop
User 2 -> Joining room failed

Without a cross dependency or otherwise lock on rooms, I'm not sure how to best do this. I'm open to suggestions...

That said, I do see another issue here, which is any osu-web errors are potentially going to leave the server in a bad state. I'll block this temporarily to double check that...

both components (osu-web shared io octane and spectator-server) being alive to continue functioning

The only way I see around this 100% is to completely separate the two. I.e. server-spectator shall do everything related to creating rooms - validating input data, creating chat channel (perhaps that can be async via interop since it's not 'core' component), and adding rows to the DB.

Which isn't a bad direction imo, but I wouldn't go down that path unless everyone was 100% on board with it and any holes in the concept are ironed out.

Copy link
Contributor Author

@smoogipoo smoogipoo Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear if it makes things any easier to digest/think about, the only thing adding/removing users from rooms entails osu-web side is adding them to the chat channel, and likely a mild form of associated validation (that the user exists, etc, and some which is duplicated here - like restrictions).

So with that in mind if you have any suggestions on how to improve this then please do.

Copy link
Contributor Author

@smoogipoo smoogipoo Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the above, and if we want to reduce dependence between services, how do you feel about this:

diff --git a/osu.Server.Spectator/Hubs/Multiplayer/MultiplayerHub.cs b/osu.Server.Spectator/Hubs/Multiplayer/MultiplayerHub.cs
index 8535b5f..081a08b 100644
--- a/osu.Server.Spectator/Hubs/Multiplayer/MultiplayerHub.cs
+++ b/osu.Server.Spectator/Hubs/Multiplayer/MultiplayerHub.cs
@@ -87,8 +87,6 @@ namespace osu.Server.Spectator.Hubs.Multiplayer
                 {
                     try
                     {
-                        await sharedInterop.AddUserToRoomAsync(Context.GetUserId(), roomId, password);
-
                         if (roomUsage.Item == null)
                         {
                             newRoomFetchStarted = true;
@@ -138,8 +136,6 @@ namespace osu.Server.Spectator.Hubs.Multiplayer
                     {
                         try
                         {
-                            await sharedInterop.RemoveUserFromRoomAsync(Context.GetUserId(), roomId);
-
                             if (userUsage.Item != null)
                             {
                                 // the user was joined to the room, so we can run the standard leaveRoom method.
@@ -168,6 +164,15 @@ namespace osu.Server.Spectator.Hubs.Multiplayer
                     }
                 }
 
+                try
+                {
+                    await sharedInterop.AddUserToRoomAsync(Context.GetUserId(), roomId, password);
+                }
+                catch (Exception ex)
+                {
+                    Error("Failed to add user to databased room", ex);
+                }
+
                 var settings = new JsonSerializerSettings
                 {
                     // explicitly use Auto here as we are not interested in the top level type being conveyed to the user.
@@ -919,10 +924,17 @@ namespace osu.Server.Spectator.Hubs.Multiplayer
 
         private async Task leaveRoom(MultiplayerClientState state, bool wasKick)
         {
-            await sharedInterop.RemoveUserFromRoomAsync(state.UserId, state.CurrentRoomID);
-
             using (var roomUsage = await getLocalUserRoom(state))
                 await leaveRoom(state, roomUsage, wasKick);
+
+            try
+            {
+                await sharedInterop.RemoveUserFromRoomAsync(state.UserId, state.CurrentRoomID);
+            }
+            catch (Exception ex)
+            {
+                Error("Failed to leave databased room", ex);
+            }
         }
 
         private async Task leaveRoom(MultiplayerClientState state, ItemUsage<ServerMultiplayerRoom> roomUsage, bool wasKick)

It'll still be logged to sentry but otherwise means users can join rooms without the osu-web part succeeding. There's still dependence on osu-web for creating rooms, but that could be removed in the future if the direction proposed previously sounds like a good path (on a lower priority basis, imo...).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally would much prefer that last patch rather than what is in this PR right now. While somewhat dicey, I'd say it's much more acceptable levels dicey given the overall low importance of the web-side operation succeeding, compared to holding locks over web requests - especially locks that aren't specific to a single user, like the room lock here.

@smoogipoo smoogipoo added the blocked Don't merge this. label Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Don't merge this.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants