Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disconnection when calling GetLobbyData() on a deleted lobby. #826

Closed
Tyaap opened this issue Apr 1, 2020 · 37 comments · Fixed by #828
Closed

Disconnection when calling GetLobbyData() on a deleted lobby. #826

Tyaap opened this issue Apr 1, 2020 · 37 comments · Fixed by #828

Comments

@Tyaap
Copy link

Tyaap commented Apr 1, 2020

GetLobbyData() results in a Steam disconnection if called on a lobby that was obtained from GetLobbyList() but has been subsequently deleted. There is currently no way to guarantee that a lobby has not been deleted before calling GetLobbyData(), and therefore avoiding disconnection. Might require major changes to the SteamKit architecture. Further details are discussed in pull request #704.

@Benjamin-Dobell
Copy link
Contributor

@Tyaap Could you please update the description. There is a way to check for lobby deletion i.e. LobbyDataCallback. The issue is there's a race condition we can seemingly do nothing about:

Quoting my comment in #704:

You're being disconnected because you're requesting a deleted lobby, not a full one. Just verified this happens.

However, Steam does send you updates automatically (without you explicitly requesting) for lobbies that you've previously received data for but have since been deleted.

Now, that still seems a bit off to me, because there's potential for a race condition in that you could go to request the data for a lobby, but it's just deleted.

I'm thinking the only way the real Steam client could get around this is that Steam's backend will allow you to make this request, until your client has confirmed receiving the packet that indicates the lobby has been deleted. Then when processing its queue Steam would see a queued outbound message pertains to a deleted resource (lobby), delete the message, and return the data from the lobby cache.

That's actually insanely convoluted, so I'd love to hear I'm wrong and there's a simpler way.

If Steam are doing the above, then we have an huge problem on our hands, as our send and receive logic operate in different threads. Everything is event driven, we really can't process everything serially like that without enormous changes to the architecture of SteamKit 😕

I'd suggest you play around and ensure you're seeing what I am, if you are, you're going to need to open an issue... but ah, if it really is the above, I wouldn't expect it to be fixed anytime soon.

@Benjamin-Dobell
Copy link
Contributor

Example code showing how one can (not so reliably) monitor lobby deletion, with notes about what could go wrong:

Benjamin-Dobell/SteamKitIntegrationTests@bd8fd31

@azuisleet
Copy link
Member

azuisleet commented Apr 1, 2020

Since we hold the lobby cache locally, and we know when a lobby is destructed, we can mark such lobbies in the cache as destructed.

It sounds like we just need to abort requests in SteamMatchmaking methods for unknown or destructed lobbies.

edit:
There's no good reason to have anything more complicated, as it's entirely possible for a packet to be sent by the client to the server referencing a stale lobby to be in flight (latency, packet loss, etc) at the same time that the server tells the client to update its cache. I would expect there's some leeway for these stale lobbies.

Assuming we have a reliable deletion message, our cache will always be up to date. By guarding inside SteamMatchmaking, we won't have any stale reads, as the read is near the actual send.

@Tyaap
Copy link
Author

Tyaap commented Apr 1, 2020

Unfortunately there does not appear to be a reliable deletion message. Benjamin has observed that this message is only sent if we have called GetLobbyData() on the lobby at least once, and it is not safe to do this.

Steam does handle this problem through some mechanism, possibly the one Benjamin suggested.

@Benjamin-Dobell
Copy link
Contributor

Benjamin-Dobell commented Apr 1, 2020

@azuisleet Yeah, that'd be the idea in general. However, to me it looked like there was no leeway (presumably beyond what's already in flight).

Admittedly, I don't know nearly enough about Steam's underlying protocols. However, I'd assume it's reliable, and thus we're confirming (with Steam's backend) each time we receive a packet. If that's the case (i.e. protocol sequencing) then if we've received the packet saying the lobby is deleted, then we would need to handle that packet 100% synchronously and block (lock) any sending (possibly on another thread) whilst we're doing so.

I really don't know enough about SteamKit's internals as to whether NetMsgReceived (and our handlers) are being fired 100% synchronously, what outbound queuing/buffering is going on, and whether we can feasibly synchronise (within) all calls of GetLobbyData.

@Benjamin-Dobell
Copy link
Contributor

Benjamin-Dobell commented Apr 1, 2020

Actually, that's a good point (whoops, my own point!)

Steam doesn't seem to notify us about deleted lobbies until we've made at least one call to get that lobby's data; at least that seemed to be the case in my admittedly hurried testing.

That just doesn't make any sense though, cause the necessary mechanism to do this safely must exist. I assume we're just missing some critical information.

@Tyaap
Copy link
Author

Tyaap commented Apr 1, 2020

@Benjamin-Dobell When I call GetLobbyData on an existing lobby, I only seem to be receiving a single LobbyDataUpdate callback corresponding to that request. I don't receive any further LobbyDataUpdate callbacks, including after the lobby is deleted. In your sample code, I suspect you are receiving this callback after the deleting the lobby because it is triggered in response to the primary Steam account leaving the lobby.

@azuisleet
Copy link
Member

azuisleet commented Apr 2, 2020

Can you confirm with DebugLog/IDebugNetworkListener that this is not a protocol error and that it really is a disconnect from the CM?

@azuisleet
Copy link
Member

azuisleet commented Apr 2, 2020

Can you also confirm if it's the first call to GetLobbyData on a lobby that disconnects, or if a subsequent call to GetLobbyData on the same lobby is disconnecting you?

My hunch is after a GetLobbyList call, you're only meant to call GetLobbyData once on each lobby. You would have to issue another GetLobbyList and then call GetLobbyData on those lobbies.

@Tyaap
Copy link
Author

Tyaap commented Apr 2, 2020

I can call GetLobbyData several times on the same lobby with no disconnect. The disconnections are only occurring after the lobby has been deleted. Not sure if it is protocol error related yet.

@azuisleet
Copy link
Member

azuisleet commented Apr 2, 2020

Also of note, in the docs for RequestLobbyList, there's a notice:

There can only be one active lobby search at a time. The old request will be canceled if a new one is started

Are you certain you're not invalidating the lobbies presented by starting a new search?

@Tyaap
Copy link
Author

Tyaap commented Apr 2, 2020

I am not 100% certain. However the example code given by Benjamin demonstrates that you can call GetLobbyList twice and subsequently run GetLobbyData for a lobby in the list from the first GetLobbyList call, even when that lobby is not in the list from the second GetLobbyList call.

@azuisleet
Copy link
Member

That may be an implementation detail. I suspect the issue is you have a cache of lobbies for your search and invalidations on the cache happen on the next GetLobbyList call

@Tyaap
Copy link
Author

Tyaap commented Apr 2, 2020

The call to GetLobbyData does successfully update the data for this lobby, which is verified by the change in the maximum player count after the call.

@azuisleet
Copy link
Member

Right, but the list of lobbies you are allowed to access changes every time you call GetLobbyList, so trying to call GetLobbyData on a lobby that appeared in a previously invalidated search would be a violation.

@Tyaap
Copy link
Author

Tyaap commented Apr 2, 2020

So you are saying that this is unintended use of the API, even though the operation completes successfully?

Its worth noting that I achieved this using the official Steam client via Steamworks.NET. However I had to call RequestLobbyList after RequestLobbyData in order for the lobby data to be retrieved in a LobbyDataUpdate_t callback. Oddly, the lobby didn't need to be present in the list from RequestLobbyList.

@azuisleet
Copy link
Member

azuisleet commented Apr 2, 2020

Are the lobbies that you receive LobbyDataUpdate_t in subsequent RequestLobbyList calls the lobbies that no longer appear in the new list? It sounds like that's just a convenience meant to keep state in sync for users that only listen to LobbyDataUpdate_t (because those lobbies are now gone in the new search).

@Tyaap
Copy link
Author

Tyaap commented Apr 2, 2020

No they are not in the list, but data can still be successfully retrieved for these lobbies after subsequent RequestLobbyList calls. I uploaded a test project that shows this earlier today, related to tracking full lobbies that disappear from the retrieved list: https://github.com/Tyaap/LobbyTrackingPOC/blob/master/Program.cs
Its not a rigorous test, but after running it for several minutes it becomes clear.

@azuisleet
Copy link
Member

Based on your code, it seems like it is just state synchronization provided by SteamClient. You see those lobbies disappear because they MUST disappear on a new search, regardless of whether they were full or not.

@Tyaap
Copy link
Author

Tyaap commented Apr 2, 2020

Are you referring to the cached data? I am referring to a lobby as "disappearing" if the lobby's steam ID is disappearing from the returned list.

@azuisleet
Copy link
Member

azuisleet commented Apr 2, 2020

I believe we are talking about the same thing. You are only ever meant to see the lobbies available in your lobby search. Since you can only have one lobby search at a time, requesting data about lobbies outside of that search is undefined behavior.

If you never start another search, you can indefinitely query the same set of lobbies.

@Tyaap

This comment has been minimized.

@Tyaap

This comment has been minimized.

@Tyaap

This comment has been minimized.

@Tyaap
Copy link
Author

Tyaap commented Apr 2, 2020

Okay nevermind, I realised I was blocking the callback thread in the test code. I have revised the code, and I am now getting LobbyDataUpdate_t callbacks from RequestLobbyData() before calling RequestLobbyList() again. I can confirm data for lobbies not present in the current list from RequestLobbyList() is being successfully retrieved, provided the lobby was not deleted on Steam servers. If it is deleted, the official Steam client handles this nicely with a return code in LobbyDataUpdate_t.

Edit aside: There appears to be a bug in Steam's API where if a lobby becomes full and the host increases the max player count to make room, the lobby is still irretrievable using RequestLobbyList().

@Benjamin-Dobell
Copy link
Contributor

Benjamin-Dobell commented Apr 2, 2020

I just tried to have a look at this again, however, after writing new tests, I can't get any lobby data updates comes through from Steam; aside from those I explicitly request with GetLobbyData() (unless I'm in the lobby).

Anyway, it looks like there was a protocol update that came in 2 days after the matchmaking functionality I wrote was merged:

https://github.com/SteamRE/SteamKit/blame/master/Resources/SteamLanguage/emsg.steamd#L1610-L1611

There's seemingly a new ClientMMSGetLobbyStatus API call, and strangely a new MMSGetLobbyList API in addition to ClientMMSGetLobbyList 😕

My guess is there's a way to use these APIs in order to keep Steam's back-end happy. The status API in particular is of interest. Mind you, even if it tells us a lobby exists, there's no guarantee it still will when we get around to requesting its data. However, perhaps this is API call necessary to encourage Steam's backend to give us a break if we end up requesting a now deleted lobby!

Anyway. I'm unfortunately really busy at the moment and have already spent more time on this than I intended.

@Tyaap Your POC using Steamworks is probably the perfect tool to get us the information we need. If you get Nethook2 up and running, it should give you a dump of whatever it is Steam are doing behind the scenes. Then we can do our best to replicate that.

@Tyaap
Copy link
Author

Tyaap commented Apr 3, 2020

I have successfully dumped the messages sent and recieved after a RequestLobbyData(), one for a full lobby and one for a deleted lobby. No other messages were dumped by Nethook2, including after the lobby became full/was deleted but before the RequestLobbyData() call.
RequestLobbyData dump.zip

Edit: Please ignore the function names for the folders contained in the zip file - both contain dumps of messages sent/recieved after calling RequestLobbyData().

@Tyaap
Copy link
Author

Tyaap commented Apr 3, 2020

Note: both of these were called after a call of RequestLobbyList() which had returned an empty lobby list.

@Tyaap
Copy link
Author

Tyaap commented Apr 3, 2020

I am going to try do another identical dump of RequestLobbyData() but this time with no prior calls to RequestLobbyList(). I'll do this by making a note of the lobby ID when creating the lobby on the alternate account (which I'm doing on a different machine).

@Tyaap
Copy link
Author

Tyaap commented Apr 3, 2020

@Tyaap
Copy link
Author

Tyaap commented Apr 3, 2020

I don't see anything special going on here, it seems Steam isn't forcefully disconnecting us for sending a ClientMMSGetLobbyData message for a deleted lobby.

@Tyaap
Copy link
Author

Tyaap commented Apr 3, 2020

Just had a look at the SteamKit debug log when calling GetLobbyData() on a deleted lobby:

MyListener - SteamMatchmaking: Assertion Failed! Got LobbyData without corresponding object in cache.
MyListener - cdfad35d49ff4161a260abb1395ecf5c/SteamClient: Unhandled 'NullReferenceException' exception from 'SteamMatchmaking' handler: 'Object reference not set to an instance of an object.'
Disconnected from Steam, reconnecting in 5...
System.Threading.Tasks.TaskCanceledException: A task was canceled.

Line 3 is printed by a function subscribed to SteamClient.DisconnectedCallback
Line 4 is printed after catching the exception from line:
await steamMatchmaking.GetLobbyData(gameId, steamId)

Edit: appears to be linked to issue #786, resulting from the exception in SteamMatchmaking

@Benjamin-Dobell
Copy link
Contributor

Ugh, this makes so much more sense.

It's because metadata in the returned CMsgClientMMSCreateLobby is null. Will submit a PR fixing the issue.

@Benjamin-Dobell
Copy link
Contributor

Well, this was a bit of a farce, my bad. PR submitted.

@Tyaap
Copy link
Author

Tyaap commented Apr 3, 2020

Thank you, I'll test it out once the pull request goes through. What is the expected result when calling GetLobbyData() on a deleted lobby?

@Benjamin-Dobell
Copy link
Contributor

Screen Shot 2020-04-04 at 5 19 35 am

A lobby with everything zero'd except for the lobby ID itself i.e. You can most easily discern a deleted lobby by the fact it has no owner and/or members.

@Tyaap
Copy link
Author

Tyaap commented Apr 3, 2020

Okay I grabbed the version from your forked repository and the fix is working well for me. Glad we got this sorted at last. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants