-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client
header to prevent echoes from server
#72
Comments
I'm not 100% sure if
Consider that if a client has two tabs open, she'll probably want edits to propagate from one tab over to the other through the server. So she probably wants to limit echoes just per tab, and since she has one connection per tab, she could limit them per connection. However, in the long run, it's even better if all tabs in a browser can share a single connection. (We have been implementing this in the lab, using cross-tab postMessage() and/or serviceWorkers.) Then each browser will need only one connection to each server, no matter how many tabs it has open. In this case, we do want block echoes per-client. I'm starting to think that we should just define a new concept here to mean exactly what we want: A Giving each peer an ID is also useful when we move to P2P networking (in the much longer-term future). There are a number of use-cases where you want each peer to have an ID. So it might be appropriate to start with a |
Client
header to prevent patch echoes from serverClient
header to prevent echoes from server
Servers can be a cluster that are state-oblivious to each other, except for the actual document replication. Meaning client is subscribed to server1 and posts PUTs on server2. The servers will sync up on the document via other means, yet the client will see the echo regardless. Same thing in P2P. One has to bake the client identity in the messages themselves, and there's already way to do that via client-specified |
My suggestion would be that:
I think this aligns well with HTTP2 then: server can per-emptively push the URL to the client, too. And client can then abort the push if it is unnecessary. I think it also aligns well with caching. A browser might have those versions available in cache already, so syncing between tabs would not really consume bandwidth multiple times, only once. But semantics exposed to the client (code inside a tab) is simple and does not have to deal with deduplication across tabs, the browser does. I would suggest we do not have a header like |
@mitar The offer/fetch doubles round trips. Header for size threshold? |
Not with HTTP2, but that is abstracted away from the client. |
@mitar It does regardless of protocol.
vs
|
@mitar This has nothing to do with http. The client needs to know first what to request in the first place, hence introducing the latency. |
With HTTP2 the server can proactively push the second response together with the first response. See vulcan which uses that effectively. |
@katuma wrote:
Oh, you're referring to the case where a client or peer implements a version history and thus can deduce that an incoming version is one it has already seen. But the motivating use-case of this spec is for clients without a full version history— that is what I meant in the OP when I said this is for clients "without a good CRDT/OT." Perhaps it would have been clearer to say "without versioned messages and a robust version history." This header's motivation is for implementing simple apps, like if you have an existing web app and want to add Braid support just for Subscriptions (see slide 3 here: https://braid.news/files/http2p2p.pdf), but haven't implemented full support for Mutations yet. Perhaps it would be most clear to say that this
When I said the peer header could be useful for some P2P situations, I was only speculating. We have a prototype of P2P acknowledgements (slide ~14 above) that gives each peer an ID, but there's no specification for it yet. Now to address @mitar's suggestion:
This suggestion doesn't prevent the echoes, it just wraps them up in HTTP/2 server-push semantics, and that doesn't address the goals stated in the OP: (1) to reduce bandwidth, and (2) simplify the implementation of clients. Server-push uses more bandwidth (since the server optimistically sends stuff), and is complex to implement.
Subscriptions are already stateful in Braid. The server has to remember the state of each client subscription in order to send them the right stuff. This is an intrinsic tradeoff in synchronization. In general, Braid changes these dimensions of HTTP:
If you want HTTP to remain stateless, then you can't have subscriptions at all. What we can do in Braid is constrain that state to as little as possible. |
What do you mean? Browsers already have it implemented. For apps on top of browsers it is completely abstracted out. They just simply do
From my experience with Meteor and ecosystem there, I must warn you that you will never ever be able to have only one setting here. People will always have slightly different use cases here where they will want to optimize slightly different things. For example, Meteor syncs state based on top-level field equality. So if anything changes deeper inside, the whole top-level field is modified. For this, it has to keep state based on top-level fields on the server. This reduces some bandwidth use, but not all. Now, I have seen many users wanting a) full diffing, deep fields, to minimize bandwidth usage b) no diffing and always sending whole objects, to minimize state usage on the server. For some cases, where you have large and deeply nested objects a) makes more sense. For cases (like chat apps) where you have small objects but many of them, b) makes more sense (you just have to remember which objects you sent, but not their content at all). So I think we should not pre-decide this. We should have a way that this can be customizable.
Yes, slightly more because the response header goes over the wire, before the client can cancel it (once it figures out from the header that is already has the response). But that is still much less then the whole response in general. Especially with HTTP2 header compression.
That's what I wrote. Let's keep only subscriptions stateful. Otherwise whole our hope of using HTTP and caching goes out of the window. And then we could just simply implement the whole thing over web sockets. So non-subscription Braid requests should be stateless in my opinion. |
We are not pre-deciding anything. We are proposing an optional spec that people can opt into if they want. Other ways of doing things can be written up as well.
The response body gets sent too.
This is already limited to subscriptions. This whole thread is only about subscriptions. Nothing in this proposal affects non-subscription requests. |
Yea, but they might be slightly sub-optimal. But what is easiest than to:
Sure, but 103 Early Hint header can be used as well, not much difference here between that and push here.
But you can terminate that before it really clears the buffer on the server.
Oh, maybe this is the source of misunderstanding. I thought we are optimizing not duplicating between subscriptions and all other calls (GET, PUT, etc.). So that if you made GET and have a version of state and then subscribe, that you do no get that state you already have again. Maybe I generalized the problem too much (to "I have state X already, I do not want to get it again; no matter how I got it, by doing GET or because I authored in the first place"). You are saying the issue is only about PUTs and not wanting to get back your own state only. Even ProseMirror had to have that, this is why you have client ID in there so that it can differentiate between changes coming back which originated by the client itself (it does not have assumptions about the transport, so client's changes might be echoed back from the server). i would suggest that the semantic of peer ID header is then really that that is the author of the change being PUTed. While in subscribe call we would have another header which allows you to filter which changes you want to not receive, so for which list of peer IDs. I think separating those and making it explicit that subscribe header is simply just for filtering and not to identify the peer is important for P2P setting. BTW, what are plan to deduplicate changes you are maybe getting over multiple subscribe connections in P2P setting? Maybe this here is just a special type of that? Because you could see it like that you have a loopback connection from you to yourself, and then from you to the server, and this is why you are getting duplicated changes. |
Ah, that does sound like the misunderstanding. Thanks for finding it. This is not about the general case, but it would be nice if it weren't incompatible with any solutions we might come up with for the general case.
That fits the description of this proposal, which adds a Since we don't have any P2P specs yet, I think we can put off the discussion of how to make this work perfectly in a P2P mesh network.
In general, I suspect it's impossibly to fully deduplicate. Imagine we have this network:
Data from (a) to (d) gets routed through two paths. I imagine those two paths will both end up reaching (d), and it'll get duplicate data. I'm not an expert in routing algorithms, and maybe there's a known way to solve this without requiring a central coordinator to plan the network, but it certainly seems like a hard problem, and one that we don't need to take on yet. So I'm not concerned about deduplicating all network requests in general. One we have complex P2P networks, we'll be using great algorithms with version histories that don't suffer from the two problems I pointed out at the beginning of this issue. The issue we're focusing on here is that dumb clients will see duplicate characters or clobbered text fields. We don't need to solve everything else yet, but we would like any network protocol we're standardizing to be forward compatible with solutions we might come up with for those. |
My proposal is slightly different, that the subscribe does not have So why are we then even trying to optimize this edge case of general deduplication at this time? If our intuition is that it might be hard or impossible to deduplicate on P2P networks at all, so some bandwidth will go to waste, why not simply be OK with some bandwidth going to waste also in this edge case? Is maybe this here now premature optimization?
They just have to remember which versions they have already seen? Do they really have to know also peers? |
I think
It's not about bandwidth. It's about duplicate characters and clobbered text fields. I've said this twice already. C-f on this page for "clobber". Re-read the OP please. These are real issues that came up in our implementations. We need a solution. This is not theoretical. |
Doesn't every patch have an unique version associated with it? I am not sure why would implementation apply same patch for same version twice? |
I already answered this question above. Please C-f for "Oh, you're referring to the case where a client or peer implements a version history". |
But there is big difference between "full version history" and "set of seen versions" in how hard is to implement each. |
Sure. The specific phrasing I used was "without versioned messages and a robust version history," which I think is accurate. |
The later is a list. Which can also be an empty list if you do maintain robust version history. :-) |
Discussion at meeting: Punt for 04. There are some other issues with OT and this proposal. |
I reread this again and I realized that this issue is really just about optimizing that client's own action does not come back to client itself over a subscription. It is interesting because I see that as a feature and this is maybe because both Meteor and 3 factor app (popularized by Hasura) are in fact using this client-own-update to get information that update is persisted on the client. So you push change to the server and when you get it back you know that the change has been accepted, and you stop waiting. It allows you in fact to go further and when you push the change to the server you forget about it (and do not update UI) until you get it back from the server, only then you update UI. So your code to update UI is always the same: with changes from the server. Client just generates changes to send to the server. I know this is a very different use case than Braid's where we primarily talk about peers and not central servers which would approve/deny updates. But it is maybe to think that there are use cases where getting these updates back to the client are beneficial. |
Client
header to prevent echoes from serverClient
header to prevent echoes from server
If a client does:
GET
/foo subscribePUT
/foo "hello"Then a naive server might send the client's PUT (2) back to it, because it subscribed in (1). We call these duplicate versions "echoes" from the server.
In addition to using unnecessary bandwidth, echoes cause problems for any client without a good CRDT/OT:
Thus, it's nicer if we can prevent echoes from happening.
A simple solution is for the client to specify a
Client: "c82ysjd3"
header on every request, and then the server can refrain from sending any version back to the same client that created it.I've implemented this in my client/server, but we haven't discussed it at all in the spec. I think we should add this
Client
header to the braid-http spec.The text was updated successfully, but these errors were encountered: