-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make ipfs.dag.export built-in feature of HTTP gateways #170
Comments
I like general idea of allowing to get / put data without dedicated http client, that this would enable. My only (weak) concern would be that this could lead to some unexpected behavior if hitting any non unixfs path would start .car download. It might be better to make this more explicit e.g. via Another argument for explicitness would be that if in the future we add support for another "file" like codec that would not require a breaking change. |
@Gozala good point. I think for non-unixfs DAGs we would return an error by default informing that there is no available preview for CID, but the original DAG can be downloaded if Q: any thoughts regarding Content-Type returned with CAR payload?
|
I think using custom mime types over |
Comment I made on this elsewhere: The varint at the front of the CAR format messes this up, so the first byte or two don't conform to a tight pattern. We did come up with some ideas to get us out of this annoying mess and fix the header in place, but that's a dream for CARv2. For now we're stuck with this design where even the This byte pattern is likely to catch most CAR files:
Assuming ?? will fit a varint describing the length of the header, which may break down for a very large CID or multiple CIDs (not supported by go-car though, but supported by JS). That string is basically this: "single byte varint + a CBOR encoded A common case will be:
but Filecoin uses blake2b-256 which has a higher multihash number so it makes its CIDs longer .. So they’re going to start with:`
So, no “magic number” sadly, we’re lumped with this design for now. But you could approximate if you were keen. |
Part of me would like to see the ipld explorer on the gateway, then a download link/querystring param for either file data or a car file. That'd give a preview of any DAG type and discourage people from using it as a cheap CDN. |
Somehow related: CAR export at gateways would provide means for Verifiable HTTP Gateway Responses (#128) without the need for exposing DAG metadata as custom HTTP headers. Exporting full DAG may not always be the best performance-wise: in the future we may want to be are able to control the depth of CAR export to facilitate parallel streaming from multiple gateways and/or seeking within media files. @rvagg are there prior issues/plans about controlling the depth of CAR export? |
I would just like to have dag get -f cbor working ( ipfs/kubo#4313 ) dag export exports only car format, and dag get exports only json. Either way, the ipfs daemon (or the ipfs embedded implementation) is spending CPU time to convert from native cbor to something else, but cbor itself would be fine as an export format, I think. |
This is very similar to something now also independently conceptualized and proposed as a project in another planning&tracking repo: protocol/web3-dev-team#1 (I'm not sure if either is a superset of the information contained in the other yet, so this isn't a suggestion to close, but they certainly seem related.) |
Updated thoughts after recent discussions from #182 (comment) and forward.
I feel this is a safer way to support DAG export, but lmk if there are any concerns. |
I think we should specify some format into the gateway semantics, and I think it should be decoupled from I think maybe we have different use cases in mind. My read of your use case is that someone would fetch a DAG from a gateway and then import it into an My perspective is that someone is using the gateway because they're not using an My canonical use case (hypothetical, because currently impossible) is a Web3 dapp that reads IPLD-native data from the IPFS network, via a gateway for the speed, reliability, pinning etc needs that the specific app has. The envelope and format for the data it fetches can be pretty thin, but needs to be well-defined. The body of a DAR is just one or more DAGs, with the nodes in a well-defined order, deduplicated etc. The index is optional, and probably unnecessary here. We could adjust that body format to suit needs here – I designed it with this use case specifically in mind, so if I was off target I'd want to change anyway. |
My pet use case is thin clients like mobile web browsers and IoT devices using gateway as energy-efficient alternative to p2p transports (which would still be used as fallback, to conserve energy). Browsers want to support websites and assets loaded over Received similar comments about IoT using content-addressing for fetching firmware updates in a trustless manner, without draining battery for usual libp2p transports.
If we want to specify format, then we should plan for both import and export from the very start. Decoupling should be fine, as long we ensure that Do you feel this is blocked on DAR replacing CAR, or can we move forward with CAR for now? |
Great, I agree with your points. No, please don't interpret my statements as anything blocking moving forward; but as considerations and options with which you may choose to move forwards, or not. I do think there are significant advantages to the more tightly-defined DAR format over CAR, for these use cases, but I'll be satisfied with a story about how we could upgrade to this in the future (e.g. with content-type headers). I do still think that the behaviour should be fairly defined, so perhaps would tweak your suggestions to make that more explicit, e.g. with |
This makes sense. If we're providing this option, it should be consistent and predictable for all URLs. So even if you hit a UnixFS node with Browser detection through user agent is also worth considering, although controversial and potentially too much magic. If what's hitting you is a browser, and you know you can't do anything useful, the gateway could force the CAR download, whether or not the explicit parameter is passed. |
@anorth Mind elaborating on which parts of DAR you consider to be advantageous over CAR with a depth-first, deduplicated traversal logic? I would imagine the deterministic depth-first nature, and the space efficiency through CID ellision, provided that it doesn't make us lose CID roundtripping, which I suspect it does in its current form. I do think that a DAR is costlier to generate on the server side in terms of memory footprint due to the substitution of CIDs links for absolute stream offsets, which requires server-side tracking, and does open up security risks IMO if used to implement this use case. |
I think the advantages are primarily the tight definition of the format. If some service producing a CAR also enforced depth-first logic and no duplicates as additional semantics, that would cover much of the benefit. A consumer could then rely on and validate that property. The stronger guarantees can be exploited by clients. This allows, for example, streaming construction of application data. E.g. a web app that builds a data model as a projection from IPLD could build that model incrementally, while validating CIDs, without ever writing blocks to a blockstore. A direct example, an application could reconstruct a unixfs file raw data in a streaming fashion. (Hmm, except for blocks referred to twice. Maybe there are cases where deduplication is unwanted?). Another, less important, advantage of DAR is that it's a deterministic representation given the ordering of roots. And then there's the space efficiency improvement, which is valuable with small node sizes (which we might see in application data, more than unixfs), even after we add a couple of bytes to fix up the CID roundtripping. It's not CIDs that are substituted, but duplicated block bodies. You're right that the deduplication does require keeping a |
ipfs/kubo#8111 landed to open up the |
Browsers tend to send |
npx ipfs-get bafkreigh2akiscaildcqabsyg3dfr6chu3fgpregiymsck7e7aqa4s52zy --output room-guardian.jpg If you prefer to curl it yourself, then you can pipe it to curl -X POST "https://ipfs.io/api/v0/dag/export?arg=bafkreigh2akiscaildcqabsyg3dfr6chu3fgpregiymsck7e7aqa4s52zy" \
| npx ipfs-car -o room-guardian.jpg or import it to your local curl -X POST "https://ipfs.io/api/v0/dag/export?arg=bafkreigh2akiscaildcqabsyg3dfr6chu3fgpregiymsck7e7aqa4s52zy" \
| ipfs dag import |
The CAR response format for |
Done:
Future work will happen as PR against Gateway specs (ipfs/specs#283):
|
Today: only unixfs+raw
Right now the
/ipfs/
path on gateway only supportsdag-pb
+raw
, everything else fails withunrecognized object type
error:Gateway exposes
/api/v0/get
so one can read blobs with other DAGs that way, but people rarely use it.Future: export anything
I believe the HTTP gateways should return every DAG type.
Here is initial idea: we recently added support for CAR import/export to go-ipfs (ipfs/kubo#6870) – what if we return non-unixfs/raw DAGs as
.car
.ipfs.dag
files?immutable
hint for everything under/ipfs/
{cid}.ipfs.dag
filename and trigger download, so the browser does not try to render the blob?download=dag
everywhere, so even unixfs DAG could be fetched as CAR from any gateway?format=car
is more flexible, as it allows forformat=tar
orjson
for dag-cbor (feat: serve CBOR encoded DAG nodes from the gateway kubo#8037)cc @mikeal @aschmahmann @autonome @Gozala @achingbrain – is this a good idea? any concerns? would PR be accepted?
Future: import DAGs
If we have export.. could we also add import? This could be safely enabled on localhost, and people could experiment with this on public gateways (there, it could be guarded by reverse proxy or some bearer token):
HTTP PUT /ipfs/{cid}
HTTP PUT /ipns/{libp2p-key}
References
?download=true
support was recently added inhttps://github.com/ipfs/go-ipfs/pull/7677
Content-Location
in response to specificAccept
in request: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Location#examplesThe text was updated successfully, but these errors were encountered: