Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing acknowledgement mechanism leads to problems in the case of stream closure #103

Closed
martenrichter opened this issue Jan 21, 2024 · 6 comments

Comments

@martenrichter
Copy link

I am currently writing unit tests for my node lib for the stream number control.
Now, I ran into an issue, which is not a bug on my side, but a problem in the http/2, and it's missing an acknowledgment mechanism.

I have the following unit test:

    client = new WebTransport(
      `${process.env.SERVER_URL}/streamlimits_getunidis`,
      { ...wtOptions }
    )
    await client.ready
    const unidistreams = []
    let numunidi = 0
    let numfailed = 0
    for (let i = 0; i < 150; i++) {
      const curstream = client.createUnidirectionalStream()
      unidistreams.push(curstream)
      curstream
        .then(() => {
          numunidi++
        })
        .catch(() => {
          numfailed++
        })
    }
    await Promise.allSettled(unidistreams)
    expect(numunidi).to.equal(100)
    expect(numfailed).to.equal(50)
    numfailed = 0
    for (let i = 0; i < 50 + adjustlimit; i++) {
      const curstream = await unidistreams.shift()
      await curstream.close()  <=== crucial point
    }
    for (let i = 0; i < 50; i++) {
      const curstream = client.createUnidirectionalStream()
      unidistreams.push(curstream)
      curstream
        .then(() => {
          numunidi++
        })
        .catch(() => {
          numfailed++
        })
    }
    await Promise.allSettled(unidistreams)
    expect(numunidi).to.equal(150)
    expect(numfailed).to.equal(0)

    const result = await client.closed
    expect(result).to.have.property('closeCode', 0)
    expect(result).to.have.property('reason', '')

The server side is as follows:

      async () => {
            for await (const session of getReaderStream(
              server.sessionStream('/streamlimits_getunidis')
            )) {
              try {
                await session.ready
                const unidistreams = []
                while (unidistreams.length < 150) {
                  unidistreams.push(
                    await getReaderValue(session.incomingUnidirectionalStreams)
                  )
                }
                await session.close()
              } catch (error) {
                // do not crash server, if a problem occurs...
              }
            }
          },

Basically, what the test does is try to open 150 streams and get only 50 streams as it hits the limit of 100 streams.
Then it closes 50 streams and can open again 50 streams.

With http/3, it works perfectly, as my implementation waits for all pending writes and quiche lib from Google signals me a Data Recvd (see also the web transport W3C § 6.4 close 5.2. Wait for stream.[[[InternalStream]]](https://www.w3.org/TR/webtransport/#dom-webtransportsendstream-internalstream-slot) to enter the "Data Recvd" state, see line with crucial point) that the data has been received so that I can close the stream safely and use it as a trigger for the next 50 streams.
For my http/2, I do not have an acknowledge mechanism on the capsule level, so I resolve the promise immediately, and the 50 remaining streams fail as it is too quick. Of course, I can delay it, but this would be unreliable.
Therefore, I think that for this particular scenario, an acknowledgment mechanism is necessary if one can not map something from the underlying http/2. But this should be specified.

Any advice on how to proceed with the implementation is appreciated, and sorry for spamming your issues.

@martinthomson
Copy link
Collaborator

This is an example of using acknowledgments to drive application state. In this case, the acknowledgments are being used to drive the close state.

In my view, we should not have specified that close awaits the "Data Recvd" state. Because while you are doing something completely reasonable here, this does not work in the presence of intermediaries. Or rather, this might, but similar patterns do not. Closing instead should resolve when the stack accepts the request to close, not when the peer acknowledges it.

The effect of that change would be that a request close would probably just bounce across threads or processes, but it would resolve virtually immediately. There are potentially cases where there is local buffering of writes beyond what flow control would allow (a bad idea in my view, but I keep hearing of it), in which case the close would be answered (almost) immediately by whatever manages that additional buffer, but it would not reach the stack until flow control opened up1.

Our JavaScript brethren have a rule that says that garbage collection must not be observable. We might reasonably use a similar rule for acknowledgments.

Footnotes

  1. That extra buffering is likely to result in further weirdness in that the provision of more credit for streams could be delayed. It's the very definition of deadlock really. If the sending end can't send a close because it doesn't have enough flow control credit and the other side is waiting on a close on a stream before it can release more flow control credit, things get stuck.

@martenrichter
Copy link
Author

In my view, we should not have specified that close awaits the "Data Recvd" state. Because while you are doing something completely reasonable here, this does not work in the presence of intermediaries. Or rather, this might, but similar patterns do not. Closing instead should resolve when the stack accepts the request to close, not when the peer acknowledges it.

Ok, then I should not make a test that uses this pattern. Rather, should I use something else to signal that enough streams are available (maybe I should just use the variant that waits until more streams are available)? And, of course, then a change of the W3C spec may be a good idea.

It's the very definition of deadlock really.
Well, all my tests so far have been very good in pointing out race conditions or problems in my code or the browser side (at least two bugs in Chromium and potentially one bug in Firefox, but this is under investigation).

@vasilvv
Copy link
Collaborator

vasilvv commented Jan 22, 2024

I don't think that the example in question is guaranteed to work all of the time, since receiving an ACK for the FIN and receiving a new MAX_STREAMS are two independent operations (and besides, the specification does not require the server to send MAX_STREAMS at any specific point); the fact that it works consistently with Chromium library is mostly an implementation detail.

@martenrichter
Copy link
Author

I don't think that the example in question is guaranteed to work all of the time since receiving an ACK for the FIN and receiving a new MAX_STREAMS are two independent operations (and besides, the specification does not require the server to send MAX_STREAMS at any specific point); the fact that it works consistently with Chromium library is mostly an implementation detail.

Well, it is guaranteed, as I close 50 streams, which triggers sending the MAX_STREAMS capsule in libquiche with the set limits. However, it is true that these are tests that need to know the trigger points of the implementation. But I see using close was not a good idea. But of course, if it takes longer to receive the MAX_STREAMS, it will fail; anyway, I will rework the test.
The question that remains for me is, what does a `close ' on the stream actually tell the javascript side about the state of the connection? From the conversation until now,, I would say an immediate return would serve the same purpose an d no developer would come to such a stupid idea, as I came up for the unit test.

@ekinnear
Copy link
Collaborator

Discussed in editor's meeting:

  • The presence of intermediaries complicates things here and also helps explain why we cannot rely on signaling about what's been received, etc.

  • For flow control/stream limit purposes, an intermediary is signing up to either pass through or choose to buffer more than what the next hop is willing to offer.

  • A consumer of the JS API should be able to await more room being available to open more streams. There is no expectation that one can close a stream and then immediately have flow credits to open a new one.

  • This also helps defend against rapid-reset-style attacks

We think the on-wire bits are working as expected, but this is a great thing to identify, as we need to clean up the W3C spec a bit to stop relying on any of this signaling.

@DavidSchinazi
Copy link
Contributor

Chair: based on this, let's continue the conversation in W3C issue 581 since that's where we'll need to write the fix. Closing this issue in favor of that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants