Skip to content
This repository has been archived by the owner on Nov 19, 2024. It is now read-only.

For compressed responses, omit Content-Length and Content-Encoding headers #23

Open
Skn0tt opened this issue Dec 11, 2023 · 8 comments
Open

Comments

@Skn0tt
Copy link

Skn0tt commented Dec 11, 2023

The fetch API is becoming the standard HTTP client for server usage. When proxying back a fetched resource to the user-agent, here's what i'd intuitively do:

async function myEndpoint(request: Request): Promise<Response> {
   return await fetch("https://www.gatsbyjs.com/Gatsby-Logo.svg");
}

The URL in this example responds with a gzip-compressed SVG. As specced in 16.1.1.1 of 15.6 HTTP - network fetch, the body stream contains an uncompressed version of the SVG, making compression transparent to the user. This works well for consuming the SVG.

However, the spec does not require headers to mirror this decompression. Although the Response has uncompressed body, the Content-Length headers shows it compressed length and the Content-Encoding header pretends it's compressed. In nodejs/undici#2514 (the issue I originally opened), I've included a repro of what this means if the Response is proxied to the user-agent:

  1. The body ends up being longer than the headers described. This is not allowed in the HTTP spec, and leads clients like cURL to warn.
  2. The user-agent will try decompress a body that isn't compressed

The current workaround is to manually alter headers on responses from fetch:

async function myEndpoint(request: Request): Promise<Response> {
   let resp = await fetch("https://www.gatsbyjs.com/Gatsby-Logo.svg");
   if (resp.headers.get("content-encoding") {
     const headers = new Headers(resp.headers)
     headers.delete("content-encoding")
     headers.delete("content-length")
     resp = new Response(body, { ...resp, headers })
   }
   return resp
}

This is cumbersome and should live in library code. It can't live in frameworks, because it's impossible to differentiate a response produced by fetch (content-encoding header, but uncompressed body) from a compressed Response created in user code (same content-encoding header, but compressed body).

Instead of this workaround, fetch should require the content-length and content-encoding headers to be deleted when response body is decompressed.

Some implementors already do this, including Deno and workerd. Netlify might implement the same for Netlify Functions 2.0.

I've checked undici (Node.js), node-fetch, Chrome and Safari - all of them expose the behaviour explained above, where the headers don't match the body.

An alternative solution for this would be whatwg#1524 - that way, frameworks can use the decompressed body field to tell compressed responses from uncompressed ones.


Summary:

What's the problem? compressed responses having content-encoding header, but decompressed body leads to incoherent Response
What's the usecase? mostly reverse proxying
Is it relevant to upstream? Yes, potentially because of service workers.
What's my suggestion for a fix? Upon decompression, content-length and content-encoding headers should be deleted.
How do engines deal with this? Most have the bug, but Deno and workerd implemented the fix I propose.

@mnot
Copy link

mnot commented Dec 20, 2023

Removing the headers automatically loses information, and changing behaviour from fetch silently might be a footgun for some.

Exposing the raw content is absolutely necessary for non-browser use cases. Maybe in addition to that, have a flag that when true makes the body coherent (by removing these headers)?

@Skn0tt
Copy link
Author

Skn0tt commented Dec 20, 2023

If we're looking to stay as in-line with upstream as possible, then adding a new field for the raw content is probably the best as it keeps all other fields the same as upstream.

A flag would still require users to think about compression, because they need to remember enabling it. That would complicate things further.

@jimmywarting
Copy link

I don't think removing the headers is such a good idea.
I would still like to be able to forward those headers as well.

But it would also require to being able to access the raw stream

@lucacasonato
Copy link
Member

lucacasonato commented Dec 21, 2023

@mnot and @jimmywarting: I think there are two related but separate use cases that are being mingled here:

  1. Exposing a decompressed stream to the user
  2. Exposing a still-compressed stream to the user

What @Skn0tt accurately points out, is that what we are doing in Fetch upstream right now, is 1) but very poorly. We decompress the stream, but then do not sanitize the rest of the response to reflect what we've done. Nor do we provide any indication to the user that we have performed decompression. This is actually very bad in the scenario where a new Content-Encoding is added that a browser may not yet understand, so a user implements manual decompression logic, and then the browser starts implementing that encoding algorithm natively. In this case, a browser adding a new native compression algorithm could break the web. Very very bad.

What I think you two are saying, is that in scenario 2) it would be detrimental to not expose Content-Length and Content-Encoding because that breaks the users' ability to decompress themselves. I agree with this strongly - however that is not the behaviour in question here, because we already do not expose a way to tell an implementation to not decompress. This is something we can discuss in a separate issue maybe.

I think that this issue (properly sanitizing the headers after automatic decompression) is a bug worth adressing upstream in Fetch. I've opened an issue here: whatwg#1729

@mnot
Copy link

mnot commented Dec 22, 2023

I think that this issue (properly sanitizing the headers after automatic decompression) is a bug worth adressing upstream in Fetch. I've opened an issue here: whatwg#1729

Understand that they're likely to be reluctant to make any behavioural change here if it breaks existing code (which is very likely, somewhere).

Fetch is a browser-focused API that made a reasonable assumption that their users had no need for compressed content. That assumption doesn't hold for intermediaries, which often need to handle the 'raw' stream.

Adding support for access to the 'raw' stream is pretty straightforward; it's "correcting" the current behaviour of Fetch in a backwards-compatible way that's tricker. That's why I suggested a flag -- acknowledging this isn't a great solution, but it is backwards-compatible.

@Skn0tt
Copy link
Author

Skn0tt commented Dec 22, 2023

Does it need to be a flag? What if, instead of a flag, rawBody field would be added in addition to body? That would be backwards-compatible, and then frameworks could check for the existence of that and use it when proxying a response. Same for users decompressing themselves.

This would mean two streams to access a shared underlying stream - not sure what performance implications that has for implementors. From what i've seen in the undici implementation, it would be fine as long as reading from the streams is mutually exclusive. From my (cursory) understanding of Deno's implementation, this also sounds doable.

@mnot
Copy link

mnot commented Dec 22, 2023

Adding rawBody (or similar) would allow access to the compressed / pre-processing content of the response (there are other cases where it might differ), but it doesn't address that when you access the body, the headers you get are a mismatch. Effectively, the headers exposed now are rawHeaders and you want something like processedHeaders -- i.e., headers suitable for what's exposed in body, right?

@Skn0tt
Copy link
Author

Skn0tt commented Jan 2, 2024

In a world of perfectly-coherent naming, yes headers would need to be named rawHeaders instead. I don't think that's necessary, though - once there's rawBody, frameworks like Express can use that to correctly handle compressed Responses. I'm happy as long as we can hide this complexity from most developers. It's OK to have frameworks deal with it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants