ipfs · bajtos · Aug 8, 2023 · Aug 8, 2023 · Aug 8, 2023 · Aug 8, 2023
@@ -80,7 +80,8 @@ Below response types SHOULD be supported:
 - [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)
   - Disables IPLD/IPFS deserialization, requests a verifiable CAR stream to be
     returned, implementations MAY support optional CAR content type parameters
-    (:cite[ipip-0412]) and the explicit [CAR format signaling in HTTP Request](#car-format-signaling-in-request).
+    (:cite[ipip-0412]), the explicit [CAR format signaling in HTTP Request](#car-format-signaling-in-request)
+    and the optional [CAR metadata block](#car-meta-content-type-parameter).
 
 - [application/vnd.ipfs.ipns-record](https://www.iana.org/assignments/media-types/application/vnd.ipfs.ipns-record)
   - A verifiable :cite[ipns-record] (multicodec `0x0300`).
@@ -301,6 +302,32 @@ of their presence in the DAG or the value assigned to the "dups" parameter, as
 the raw data is already present in the parent block that links to the identity
 CID.
 
+## CAR `meta` (content type parameter)
+
+The `meta=eof` parameter allows clients to request the server to include additional metadata about the
+CAR to be included at the end of the response body.
+
+This parameter SHOULD only be used with CAR `version=1`.
+Values other than `eof` SHOULD be ignored.
+
+When the parameter is not set, the server must not add any extra CAR blocks to the response.
+
+The metadata block is a regular CAR block with the following properties:
+
+- CID specifies multicodec `car-metadata` (`0x04ff`), see
+  [multicodec#334](https://github.com/multiformats/multicodec/pull/334).
+
+- The payload contains metadata encoded as DAG-CBOR.
+
+The metadata MUST include the following fields:
+
+- `len` - byte length of the CAR data (excluding the metadata block)
+- `b3h` - Blake3 hash (checksum) of the CAR data (excluding the metadata block).
+- `b3h_sig` - A signature over `<len><b3h><request>` using server's Ed2559 identity.
+  - `len` is encoded as `varint`,
+  - `b3h` is encoded as 32 bytes,
+  - The effective query as executed by the gateway. This query is the request url - path and query string arguments.
- `b3h` - Blake3 hash (checksum) of the CAR data (excluding the metadata block).
- `b3h_sig` - A signature over `<len><b3h><request>` using server's Ed2559 identity.
-  - `len` is encoded as `varint`,
-  - `b3h` is encoded as 32 bytes,
-  - The effective query as executed by the gateway. This query is the request url - path and query string arguments.
- `b3h` - Blake3 hash (checksum) of the CAR data (excluding the metadata block).
- `b3h_sig` - A signature over `<len><b3h><request>` using server's Ed2559 identity.
-  - `len` is encoded as `varint`,
-  - `b3h` is encoded as 32 bytes,
-  - The effective query as executed by the gateway. This query is the request url - path and query string arguments.
+
 ## CAR format parameters and determinism
 
 The default header and block order in a CAR format is not specified by IPLD specifications.

@@ -0,0 +1,145 @@
+---
+title: "IPIP-0431: Opt-in Extensible CAR Metadata on Trustless Gateway"
+date: 2023-08-08
+ipip: proposal
+editors:
+  - name: Miroslav Bajtoš
+    github: bajtos
+    affiliation:
+        name: Protocol Labs
+        url: https://protocol.ai/
+relatedIssues:
+  - https://github.com/filecoin-project/boost/issues/1597
+order: 431
+tags: ['ipips']
+---
+
+## Summary
+
+Define an optional enhancement of the CARv1 stream that allows a Gateway server to provide
+additional metadata about the CARv1 response. Introduce a new content type that allows the client
+and the server to signal or negotiate the inclusion of extra metadata.
+
+## Motivation
+
+SPARK is a Filecoin Station module that measures the reputation of Storage Providers by periodically
+retrieving a random CID. Since both SPs and SPARK nodes are permissionless, and Proof of Retrieval
+is an unsolved problem, we need a way to verify that a SPARK node retrieved the given CID from the
+given SP. To enable that, we need the Trustless Gateway serving the retrieval request to include a
+retrieval attestation after the entire response was sent to the client.
+
+Aside from this specific use case, the IPFS Ecosystem at large has no reliable
+mechanism to signal that a CAR file transmission over HTTP completed successfully.
+
+However, we need this in order to be able to use CARs as a way of serving streaming
+responses for queries. One way of solving this problem is to append an extra block at the end of the
+CAR stream with information that clients can use to check whether all CAR blocks have been received.
+
+## Detailed design
+
+CAR content type
+([`application/vnd.ipld.car`](https://www.iana.org/assignments/media-types/application/vnd.ipld.car))
+already supports optional parameters like `version` and `order`, which allows
+HTTP client to opt-in via `Accept` header and Gateway to indicate via
+`Content-Type` header which CAR flavor is returned with the response.
+
+The proposed solution introduces a new parameter for the CAR content type in HTTP requests
+and responses: `meta`.
+
+When the CAR content type parameter `meta` is set to `eof`, the Gateway will write one additional CAR
+block with metadata to the response, after it sent all CAR blocks.
+
+The metadata format is DAG-CBOR and open to extension, allowing standardized
+userland experimentation similar to the Extensible Data field from IPNS V2.
+
+See [CAR `meta` (content type parameter)](/http-gateways/trustless-gateway/#car-meta-content-type-parameter)
+in Trustless Gateway specification for more details.
+
+## Design rationale
+
+The proposal introduces a minimal change allowing Gateways and retrieval clients to explicitly opt
+into receiving additional metadata block at the end of the CAR response stream.
+
+The metadata block is designed to be very flexible and able to support new use-cases that may arise
+in the future.
+
+### User benefit
+
+- Clients of trustless gateways can use the fields from the metadata as an attestation that they
+performed the retrieval from the given server.
+
+- The `len` field in the metadata block allows clients to verify whether they received all CAR
+bytes, which provides a backward-compatible solution for the [CARv1 streaming problem](https://github.com/ipfs/specs/pull/332) until new CAR version is introduced.
+
+### Compatibility
+
+The new feature requires clients to explicitly ask the server to include the extra block via `Accept` header,
+therefore the change is fully backwards-compatible for all existing gateway clients.
+
+Gateways receiving requests for the CAR content type can ignore the `meta` parameter they don't
+support and return back a response with one of the CAR content types they support. This makes the
+proposed change backwards-compatible for existing gateways too.
+
+
+### Security
+
+The proposed specification change does not introduce any negative security implications.
+
+### Alternatives
+
+#### HTTP Trailers
+
+Instead of adding a new content type argument, we were considering sending the additional metadata
+in HTTP response trailers. Unfortunately, HTTP trailers are not widely supported by the ecosystem.
+Nginx proxy module discards them, [browser `Fetch API` does not allow JS clients to access trailer
+headers](https://github.com/mdn/browser-compat-data/issues/14703), neither does the Rust `reqwest` client.
+
+#### New Content-Type
+
+We could introduce a new content type that is not CARv3, but a thin envelope
+around CARv1 with purpose of streaming over HTTP (e.g. `Content-Type:
+application/vnd.ipld.car-stream`).
+
+It would have three fields:
+- `car-stream-header` (optional DAG-CBOR)
+- `car` (same as `application/vnd.ipld.car;version=1`)
+- `car-stream-end` (optional DAG-CBOR)
+
+This will be enough to append DAG-CBOR  manifest at the end of the stream. It
+would be effectively the same CAR byte stream, but with different
+`Content-Type`.
+
+Upside of this solution:
+
+- does not require registering new codec, or mixing data plane with control
+  plane, no sniffing the last DAG-CBOR block
+
+Downsides of this solution:
+
+- maintenance cost, requires duplicating of all CAR-related tests and features
+- ecosystem opportunity cost, in creating new content type, we increase
+  cognitive overhead for everyone working with IPFS over HTTP
+- no backward-compatible interop with existing tools and gateways that only
+  speak `application/vnd.ipld.car`
+- distracts us away from working on things like large blocks and CARv3
+
+#### Create CARv3
+
+We could admit we've clearly hit limitation of what we can do with HTTP and CARv1 and CARv2 and stop abusing existing CARv1 by mixing data plane with control plane.
+
+Spend energy on creating CARv3 that solves the problems from "Motivation" section and more:
+- optional index or key-value metadata before or after data
+- native truncation detection and standardized error handling and passing during streaming
+- support for things like [Large Blocks](https://discuss.ipfs.tech/t/supporting-large-ipld-blocks/15093/)
+
+TODO: link to some public  artifact about CARv3
+
+## Test fixtures
+
+TBD
+
+Using one CID, request the CAR data using various combinations of content type parameters.
+
+### Copyright
+
+Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).