Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choosing between a single packet number space vs. multiple packet number spaces #96

Closed
qdeconinck opened this issue Mar 3, 2022 · 16 comments · Fixed by #103
Closed

Comments

@qdeconinck
Copy link
Contributor

This draft initially originates from a merging effort of previous Multipath proposals. While many points were found to be common between them, there remains one design point that still requires consensus: the number of packet number spaces that a Multipath QUIC connection should support (i.e., one for the whole connection vs. one per path).

The current draft enables experimentation with both variants, but in the final version we will certainly need to choose between one of the versions.

@huitema
Copy link
Contributor

huitema commented Mar 3, 2022

The main issues mentioned so far:

  • Efficiency. One number space per path should be more efficient, because implementations can directly reuse the loss-recovery logic specified for QUIC
  • ACK size. Single space leads to lots of out of order delivery, which causes ACK sizes to grow
  • Simplicity of code. Single space can be implemented without adding mane new code paths to uni-path QUIC
  • Shorter header. Single space works well with NULL length CID

The picoquic implementation shows that "efficiency" and "ack size" issues of single space implementations can be mitigated. However, that required significant improvements in the code:

  • to obtain good loss efficiency, picoquic remembers not just the path on which a packet was sent, but also the order of the packet on this path, i.e. a "virtual sequence number". The loss detection logic then operates on that virtual sequence number.

  • to contain ACK size, picoquic implements a prioritization logic to select the most important ranges in an ACK, avoid acking the same range more than 3 or 4 times, and keep knowledge of already acknowledged ranges so range coalescing works.

I think these improvements are good in general, and I will keep them in the implementations whether we go for single space or not. The virtual sequence number is for example useful if the CID changes for reasons not related to path changes in multiple number space variants. It is also useful in unipath variants to avoid interference between sequence numbers used in probes and the RACK logic. The ACK size improvements do reduce the size of ACKs in presence of out of order delivery, e.g., if the network is doing some kind of internal load balancing. On the other hand, the improvements are somewhat complex, would need to be described in separate drafts, and pretty much contradicts the "simplicity of code" argument.

So we are left with the "Null length CID" issue. I see for cases:

Client CID Sender CID Support Priority
long long Supported in both variants Used by many implementations
NULL long Requires special support in multiple spaces case, but could work Preferred configuration of many big deployments
long NULL Requires special support in multiple spaces case, but could work Rarely used, server load balancing does not work
NULL NULL Does not work for multiple spaces Only mentioned in some P2P deployments

The point here is that it is somewhat hard to deploy a large server with NULL CID and use server load balancing. This configuration is mostly favored by planned P2P deployments.

@huitema
Copy link
Contributor

huitema commented Mar 3, 2022

The big debate is for the configuration with NULL CID on client, long CID on server. The packets from server to client do not carry a CID, and only the last bytes of the sequence number. The client will need some logic to infer the full sequence number before decrypting the packet. The client could maybe use the incoming 5 tuple as part of the logic, but it is not obvious. It is much simpler to assume a direct map from destination CID to number space. That means, if a peer uses a NULL CID, all packets sent to that peer are in the same number space.

@huitema
Copy link
Contributor

huitema commented Mar 3, 2022

Revised table:

Client CID Sender CID What
long long Multiple number space
NULL long Multiple number spaces on client side (one per CID), single space on server side
long NULL Multiple number spaces on server side (one per CID), single space on client side
NULL NULL single number space on each side

If a node advertises both NULL CID and multipath support, they SHOULD have logic to contain the size of ACK.
If a node engages in multipath with a NULL CID peer, they SHOULD have special logic to make loss recovery work well.

@huitema
Copy link
Contributor

huitema commented Mar 3, 2022

I think the above points to a possible "unified solution":

  • if multipath negotiated, tie number spaces to destination CID, use default (N=0) if NULL CID.
  • use multipath ack; in multipath ACK, identify number space by sequence number of DCID in received packets. Default that number to zero if received with NULL CID.
  • in ABANDON path, identify path either by DCID of received packets, SCID of sent packet, or "this path" (sending path) if CID is NULL.

@yfmascgy
Copy link
Contributor

yfmascgy commented Mar 3, 2022

I like the proposal of the "unified solution". I think the elegance lies in the fact that it allows us to automatically cover all four cases listed above. The previous dilemma for me was that on one hand we have some use cases where we need to support more than two paths and separate PN makes the job easier, but on the other hand, I think we should not ignore the NULL CID use cases as it is also important. Now, with this proposal, a big part of the problem is solved. The rest of challenge is to make sure single PN remains efficient in terms of ACK and loss recovery. On that part, we plan to do an A/B test and would love to share the results when we get them.

There is one more problem as pointed in issue #25 , when we want to take hardware offloads into account. In such a case, we may still need single PN for long server CID. However, if hardware supports nonce modification, this problem can be addressed with the proposed "unified solution".

@huitema
Copy link
Contributor

huitema commented Mar 3, 2022

If the APi does not support 96 bit sequence numbers, it should always be possible to create an encryption context per number space, using Key=current key and ID = current-ID + CID sequence number. Of course, that forces creation of multiple context, and that makes key rotation a bit harder to manage. But still, that should be possible.

@mirjak
Copy link
Collaborator

mirjak commented Mar 4, 2022

Thanks for the summary @huitema. I think one point is missing in your list which is related to issue #87. Use of a single packet number space might not support ECN.

Regarding the unified solution: I think what you actually say is that we would specify both solutions and everybody would need to implement both logics. At least regarding the "simplicity of code" argument, that would be the worst choice.

If we can make the multiple packet number spaces solution work with one-sided CIDs, I'm tending toward such an approach. Use of multiple packet number spaces avoids ambiguity/needed "smartness" in ACK handling and packet scheduling which, as you say above, can make the implementation more complex and, moreover, wrong decisions may have large impact on efficient (both local processing and on-the-wire). I don't think we want to leave these things to each implementation individually.

@qdeconinck
Copy link
Contributor Author

The summary Christian made above about design comparison sounds indeed quite accurate. Besides ECN, my other concern about single packet number is that it require cleverness from the receiver side if you want to be performant in some scenarios. At the sender-side, you need to consider a path-relative packet number threshold instead of an absolute one to avoid spurious losses.

Just a point I think we did not mentioned yet is that there can be some interactions between Multipath out-of-order number delivery and incoming packet number duplicate detection. This requires maintaining some state at the receiver side, as described by https://www.rfc-editor.org/rfc/rfc9000.html#section-12.3-12. With single packet number space, the receiver should take extra care when updating the "minimum packet number below which all packets are immediately dropped". Otherwise, in presence of paths with very different latencies, the receiver might end up discarding packets from a (slower) path.

I'm also preferring the multiple packet number spaces solution for the above reasons. I'm not against thinking for a "unified" solution (the proposal sounds interesting), but I wonder how much complexity this would add compared to requiring end hosts to use one-byte CIDs.

@huitema
Copy link
Contributor

huitema commented Mar 4, 2022

I think the issue is not really so much "single vs multiple number space" as "support for multipath and NULL CID". As noted by Mirja, there is an implementation cost there. My take is:

  • If a node uses NULL CID and proposes support of multipath, then that node MUST implement code to deal with size of acknowledgements.
  • If a node sees a peer using NULL CID and supporting multipath, then that node MUST either use only one path at a time or implement code to deal with impact on loss detection due to out of order arrivals, and impact on congestion control including ambiguity of ECN signals.

Then add sections on what it means to deal with the side of acknowledgements, out of order arrivals, and congestion control.

I think this approach ends up with the correct compromises:

  • It keeps the main case simple. Nodes that rely on multipath just use long CID and one number space per CID.
  • Nodes that use long CID never have to implement the ACK length minimization algorithms.
  • Nodes that see a peer using NULL CID have a variety of implementation choices, e.g., sending on just one path at a time, sending on mostly one path at a time in "make or break" transitions, or implementing sophisticated algorithms.

If we do that, we can get rid of the single vs multiple space discussion, and end up with a single solution addressing all use cases.

@obonaventure
Copy link
Contributor

Looking at all the discussions here and in other issues such as ECN, I think that we should try to write two different versions of section 7:

  • one covering all the aspects of single packet number space (RTT estimation, retransmissions, ECN, handling of CIDs, ...)
  • one covering all the aspects of multiple packet number space (RTT estimation, retransmissions, ECN, handling of CIDs, ...)

There would be some overlap between these two sections and also some differences that would become clear as we specify them. At the end of this writing, we'll know whether it is possible to support/unify both or we need to recommend a single one. The other parts of the document are almost independent of that and can evolve in parallel with these two sections.

However, I don't think that such a change would be possible by Monday

@huitema
Copy link
Contributor

huitema commented Mar 4, 2022

I don't think we should rush changes before we have agreed on the final vision.

@mirjak
Copy link
Collaborator

mirjak commented Mar 4, 2022

It might be helpful to have these options as PRs (without merging) them, so people can understand all details of each approach.

@yfmascgy
Copy link
Contributor

yfmascgy commented Mar 4, 2022

I agree with @huitema that supporting multi-path with null CIDs is a more fundamental issue than the efficiency comparison between single PN and separate PN, as it would ultimately impact the application scope of multipath QUIC. But indeed, we might want to implement proposed solution first and then decide if we want to adopt such a unified approach.

@huitema
Copy link
Contributor

huitema commented Mar 4, 2022

Since @mirjak prodded me, we now have a PR for the "unified" proposal.

@Yanmei-Liu
Copy link
Contributor

I totally agree with Christian that the issue is about "support for multipath and NULL CID", and the solution that Christian suggested looks really great! It both takes advantage of multiple spaces, and support NULL CID users without affect the efficiency of ACK arrangements.

Besides, the solution is more convenient for implementations, because If both endpoints uses non-zero length cids, endpoints only need to support multiple spaces, and if one of the endpoints use NULL CID, it could use single pn space in one direction and could support NULL CID and multipath at the same time.

@Yanmei-Liu Yanmei-Liu linked a pull request Mar 10, 2022 that will close this issue
@mirjak
Copy link
Collaborator

mirjak commented Mar 23, 2022

Very high level summary of IETF-113 discussion seems that there is interest and likely support for the unified solution (review minutes for further details).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants