-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync issues with missing Yjs steps/structs #4600
Comments
I think we are missing a mechanism for resyncing clients once they get out of sync. yjs-websocket provides this in https://github.com/yjs/y-websocket/blob/master/src/y-websocket.js#L300-L311 - but that only works with a server that understands the y-js protocol and can answer the sync type 1 messages from the client. All a shared resync with multiple clients would take would be:
At the same time this sync scales fairly well:
For this to scale nicely it's important that only one client performs 1. Otherwise everything will be multiplied by Since we already have an autosave request that is send by one client every 30 seconds (?) I propose the following: This should be enough to provide the resync mechanism. We could rely on it and simplify other parts of the sync: c) When autosaving we don't need to upload the doc state anymore as that's encoded in the first message in 1. |
@max-nextcloud: how would 1. be triggered? In other words: when exactly would client (A) send the full current yjs state? When it thinks it's out of sync? Or when it's requested by some other party to do so? As far as I understand, you would make the client do 1. along with the autosave every 30 seconds. How would you ensure that only one client does so? And how do we decide which client is responsible to do it? What if this client becomes unresponsive? Apart from these questions, your approach sounds sensible to me. |
This morning, @juliushaertl and me went through the code and identified two more potential causes for problems:
We're not sure whether any of these two causes the problems people experience, but they should be fixed nevertheless. @max-nextcloud and @juliushaertl if you agree, I could prepare patches for these two problems. |
Regarding the y.js based resyncing mechanism, we were wondering earlier when this would actually be triggered. So far I think that y.js would only send the SyncStep1 on:
Now my main question to consider implementing the above resync handling would be if we can find a resoning on why the existing methods to get steps to the clients could miss steps:
|
If i remember correctly we never used them on the cient side. So while i agree that we should fetch all steps for now this should not matter. Plus the whole approach of resending the entire history seems inefficient. |
A patch for 1. seems like a good idea to eliminate race conditions. With 2. as i said i'd check if this is actually being used before investing time in something that I hope will be replaced anyway. |
Maybe we can have a quick logging patch to at least see an indication if someone runs into the issue |
I had another look how autosave actually works. The autosave function inside My understanding now is that every client will trigger the autosave and the server will just ignore them for a while to avoid saving all of the time. But I have not looked into the server side code here yet. This approach seems doable for the resync as well. All clients trigger a resync at most x seconds and the server only distributes one of them every x seconds. There may be better times to trigger a resync than the |
Attempts to reproduce this - ideally in a reliable wayFailed### Oct 27th - two clients against local dev serverI tried to reproduce this problem on a local instance. So far i failed - but also tried with the server and both clients running on the same machine:
The laptop sleep obviously affected both clients and the server. Will try with different machines next. ### Oct 28th - three clients against local dev and remote server, 12h sleepTried some more things to reproduce this. This time with two authenticated users and one guest against the local dev server and cloud.nextcloud.com. Still failing. Either I just did not run into the problem or disconnecting and reconnecting network is not enough and neither is sending the laptop to sleep. I had the laptop sleep over night but the editing session was still working and in sync afterwards. Other bugs found / reproduced
|
Prevent a possible race condition when two clients add steps at the same time. See #4600. Rely on the autoincrementing id in order to provide a canonical order that steps can be retrieved in. When two clients push steps at the same time the entries receive destinct ids that increment. So if another client fetches steps in between it will see the smaller id as the version of the fetched step and fetch the other step later on. Transition: In the future we can drop the version column entirely but currently there are still steps stored in the database that make use of the old column. So we need to transition away from that. In order to find entries that are newer than version x we select those that have both a version and an id larger than x. Entries of the new format are newer than any entry of the old format. So we set their version to the largest possible value. This way they will always fulfill the version condition and the condition on the id is more strict and therefore effective. For the old format the version will be smaller than the id as it's incremented per document while the id is unique accross documents. Therefore the version condition is the more strict one and effective. The only scenario where the version might be larger than the id would be if there's very few documents in the database and they have had a lot of steps stored in single database entries. Signed-off-by: Max <[email protected]>
Prevent a possible race condition when two clients add steps at the same time. See #4600. Rely on the autoincrementing id in order to provide a canonical order that steps can be retrieved in. When two clients push steps at the same time the entries receive destinct ids that increment. So if another client fetches steps in between it will see the smaller id as the version of the fetched step and fetch the other step later on. Transition: In the future we can drop the version column entirely but currently there are still steps stored in the database that make use of the old column. So we need to transition away from that. In order to find entries that are newer than version x we select those that have both a version and an id larger than x. Entries of the new format are newer than any entry of the old format. So we set their version to the largest possible value. This way they will always fulfill the version condition and the condition on the id is more strict and therefore effective. For the old format the version will be smaller than the id as it's incremented per document while the id is unique accross documents. Therefore the version condition is the more strict one and effective. The only scenario where the version might be larger than the id would be if there's very few documents in the database and they have had a lot of steps stored in single database entries. Signed-off-by: Max <[email protected]>
Prevent a possible race condition when two clients add steps at the same time. See #4600. Rely on the autoincrementing id in order to provide a canonical order that steps can be retrieved in. When two clients push steps at the same time the entries receive destinct ids that increment. So if another client fetches steps in between it will see the smaller id as the version of the fetched step and fetch the other step later on. Transition: In the future we can drop the version column entirely but currently there are still steps stored in the database that make use of the old column. So we need to transition away from that. In order to find entries that are newer than version x we select those that have both a version and an id larger than x. Entries of the new format are newer than any entry of the old format. So we set their version to the largest possible value. This way they will always fulfill the version condition and the condition on the id is more strict and therefore effective. For the old format the version will be smaller than the id as it's incremented per document while the id is unique accross documents. Therefore the version condition is the more strict one and effective. The only scenario where the version might be larger than the id would be if there's very few documents in the database and they have had a lot of steps stored in single database entries. Signed-off-by: Max <[email protected]>
Prevent a possible race condition when two clients add steps at the same time. See #4600. Rely on the autoincrementing id in order to provide a canonical order that steps can be retrieved in. When two clients push steps at the same time the entries receive destinct ids that increment. So if another client fetches steps in between it will see the smaller id as the version of the fetched step and fetch the other step later on. Transition: In the future we can drop the version column entirely but currently there are still steps stored in the database that make use of the old column. So we need to transition away from that. In order to find entries that are newer than version x we select those that have both a version and an id larger than x. Entries of the new format are newer than any entry of the old format. So we set their version to the largest possible value. This way they will always fulfill the version condition and the condition on the id is more strict and therefore effective. For the old format the version will be smaller than the id as it's incremented per document while the id is unique accross documents. Therefore the version condition is the more strict one and effective. The only scenario where the version might be larger than the id would be if there's very few documents in the database and they have had a lot of steps stored in single database entries. Signed-off-by: Max <[email protected]> Signed-off-by: Jonas <[email protected]>
Prevent a possible race condition when two clients add steps at the same time. See #4600. Rely on the autoincrementing id in order to provide a canonical order that steps can be retrieved in. When two clients push steps at the same time the entries receive destinct ids that increment. So if another client fetches steps in between it will see the smaller id as the version of the fetched step and fetch the other step later on. Transition: In the future we can drop the version column entirely but currently there are still steps stored in the database that make use of the old column. So we need to transition away from that. In order to find entries that are newer than version x we select those that have both a version and an id larger than x. Entries of the new format are newer than any entry of the old format. So we set their version to the largest possible value. This way they will always fulfill the version condition and the condition on the id is more strict and therefore effective. For the old format the version will be smaller than the id as it's incremented per document while the id is unique accross documents. Therefore the version condition is the more strict one and effective. The only scenario where the version might be larger than the id would be if there's very few documents in the database and they have had a lot of steps stored in single database entries. Signed-off-by: Max <[email protected]> Signed-off-by: Jonas <[email protected]>
Prevent a possible race condition when two clients add steps at the same time. See #4600. Rely on the autoincrementing id in order to provide a canonical order that steps can be retrieved in. When two clients push steps at the same time the entries receive destinct ids that increment. So if another client fetches steps in between it will see the smaller id as the version of the fetched step and fetch the other step later on. Transition: In the future we can drop the version column entirely but currently there are still steps stored in the database that make use of the old column. So we need to transition away from that. In order to find entries that are newer than version x we select those that have both a version and an id larger than x. Entries of the new format are newer than any entry of the old format. So we set their version to the largest possible value. This way they will always fulfill the version condition and the condition on the id is more strict and therefore effective. For the old format the version will be smaller than the id as it's incremented per document while the id is unique accross documents. Therefore the version condition is the more strict one and effective. The only scenario where the version might be larger than the id would be if there's very few documents in the database and they have had a lot of steps stored in single database entries. Signed-off-by: Max <[email protected]> Signed-off-by: Jonas <[email protected]>
Prevent a possible race condition when two clients add steps at the same time. See #4600. Rely on the autoincrementing id in order to provide a canonical order that steps can be retrieved in. When two clients push steps at the same time the entries receive destinct ids that increment. So if another client fetches steps in between it will see the smaller id as the version of the fetched step and fetch the other step later on. Transition: In the future we can drop the version column entirely but currently there are still steps stored in the database that make use of the old column. So we need to transition away from that. In order to find entries that are newer than version x we select those that have both a version and an id larger than x. Entries of the new format are newer than any entry of the old format. So we set their version to the largest possible value. This way they will always fulfill the version condition and the condition on the id is more strict and therefore effective. For the old format the version will be smaller than the id as it's incremented per document while the id is unique accross documents. Therefore the version condition is the more strict one and effective. The only scenario where the version might be larger than the id would be if there's very few documents in the database and they have had a lot of steps stored in single database entries. Signed-off-by: Max <[email protected]> Signed-off-by: Jonas <[email protected]>
Prevent a possible race condition when two clients add steps at the same time. See #4600. Rely on the autoincrementing id in order to provide a canonical order that steps can be retrieved in. When two clients push steps at the same time the entries receive destinct ids that increment. So if another client fetches steps in between it will see the smaller id as the version of the fetched step and fetch the other step later on. Transition: In the future we can drop the version column entirely but currently there are still steps stored in the database that make use of the old column. So we need to transition away from that. In order to find entries that are newer than version x we select those that have both a version and an id larger than x. Entries of the new format are newer than any entry of the old format. So we set their version to the largest possible value. This way they will always fulfill the version condition and the condition on the id is more strict and therefore effective. For the old format the version will be smaller than the id as it's incremented per document while the id is unique accross documents. Therefore the version condition is the more strict one and effective. The only scenario where the version might be larger than the id would be if there's very few documents in the database and they have had a lot of steps stored in single database entries. Signed-off-by: Max <[email protected]> Signed-off-by: Jonas <[email protected]>
Prevent a possible race condition when two clients add steps at the same time. See #4600. Rely on the autoincrementing id in order to provide a canonical order that steps can be retrieved in. When two clients push steps at the same time the entries receive destinct ids that increment. So if another client fetches steps in between it will see the smaller id as the version of the fetched step and fetch the other step later on. Transition: In the future we can drop the version column entirely but currently there are still steps stored in the database that make use of the old column. So we need to transition away from that. In order to find entries that are newer than version x we select those that have both a version and an id larger than x. Entries of the new format are newer than any entry of the old format. So we set their version to the largest possible value. This way they will always fulfill the version condition and the condition on the id is more strict and therefore effective. For the old format the version will be smaller than the id as it's incremented per document while the id is unique accross documents. Therefore the version condition is the more strict one and effective. The only scenario where the version might be larger than the id would be if there's very few documents in the database and they have had a lot of steps stored in single database entries. Signed-off-by: Max <[email protected]> Signed-off-by: Jonas <[email protected]>
Prevent a possible race condition when two clients add steps at the same time. See #4600. Rely on the autoincrementing id in order to provide a canonical order that steps can be retrieved in. When two clients push steps at the same time the entries receive destinct ids that increment. So if another client fetches steps in between it will see the smaller id as the version of the fetched step and fetch the other step later on. Transition: In the future we can drop the version column entirely but currently there are still steps stored in the database that make use of the old column. So we need to transition away from that. In order to find entries that are newer than version x we select those that have both a version and an id larger than x. Entries of the new format are newer than any entry of the old format. So we set their version to the largest possible value. This way they will always fulfill the version condition and the condition on the id is more strict and therefore effective. For the old format the version will be smaller than the id as it's incremented per document while the id is unique accross documents. Therefore the version condition is the more strict one and effective. The only scenario where the version might be larger than the id would be if there's very few documents in the database and they have had a lot of steps stored in single database entries. Signed-off-by: Max <[email protected]> Signed-off-by: Jonas <[email protected]>
I can confirm that there seems to be a way for people connecting to get out of sync. I had one document being edited by 5 people and a 6th only got a old version of the document, even if he refreshed the page (text 3.8.0). Unfortunately I have not found a way to reproduce the issue consistently. |
We have a fix for the one scenario that we could reproduce reliably. It will ship with the next releases - that is 28.0.2, 27.1.6 and 26.0.11. I'm curious to see if it also fixes the harder to reproduce cases. I'll keep this issue open until we got some feedback. |
Thanks everyone for your input here and @MrRinkana for trying to further hunt this down. Given the feedback of users who suffered from this issue we expect this to be fixed with most recent Nextcloud 27 and 28 releases. @MrRinkana could you please try whether you're able to create an out of sync state by adding tables and open a new issue about it if so? |
Describe the bug
In a test session with four participants we were able to create a situation were changes of two participants were no longer applied to the document of the two others.
Some first insights after analysing the HAR file of three participants
AAA...
) that's filtered out by the server in the moment the synchronisation stopped for clients (c) and (d).The text was updated successfully, but these errors were encountered: