-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(gateway): JSON and CBOR response formats (IPIP-328) #9335
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
Hey, it might be worth considering just mapping Unless you want pretty-printing for JSON, which is slightly different, but I also think that's probably not wanted out of the gateway. |
@rvagg the idea here was that, if you request |
@hacdias that sounds like a good approach—if something is already in that codec then you don't really need to decode, so presumably you won't even need to touch those codecs since you're just posting back the raw bytes? One major difference worth noting with the JSON codec in Go is that it pretty-prints the output. I'd like to reconcile that with the JS codec which doesn't and I've been asking for feedback whether that's helpful or not. I don't know if it's relevant here if you're going to default to the DAG-JSON codec if the block isn't already plain JSON, because you'll just get what they encoded it as—unless you're planning on doing a round-trip through the codec again?
Shouldn't be a problem for the most part I think, but there may be edges around large integers because iirc JSON doesn't strictly have a limit on int max/min and who knows what people are throwing into IPFS. In Go we limit maximum size at int64 boundaries.
This gets more tricky because "CBOR", like "JSON", isn't as strictly defined in terms of an IPLD codec and CBOR can have all sorts of funky things inside of it that don't fit properly in the data model—non-string map keys, tags for dates, regexes, and all sorts of weird things; plus you have the big integer problem too since there are ways to represent them but the Go implementation uses int64—for standard CBOR (including DAG-CBOR) there's this space between int64 and uint64 where we have an overflow issue in go-ipld-prime so you get an error, but in extended CBOR you can have arbitrary sized integers using tags. There is a history of people throwing weird CBOR into DAG-CBOR but we've really tightened our codecs in the last couple of years to make that uncomfortable. If things work as they should, then you'll just get errors when blocks can't be properly decoded, hopefully those error messages are helpful. |
@rvagg thanks for the comment. I simplified the code to just output the raw bytes in case the data is already encoded in the output codec, reducing possible CPU intensive operations. Note: I can probably simplify it further in order to use the same code as the |
2e5a05d
to
7ffef86
Compare
@rvagg @lidel so I noticed something that may influence whether or not we actually spit out the original bytes iff the requested codec is the same as the CID codec (also ref: ipfs/specs#328 (comment)). Let's take the following json document and add it to IPFS: $ ipfs dag put --input-codec json --store-codec json
{
"foo": {
"bar": "hello, world"
}
}
> bagaaieraev5tujfaql3zxtvubhbuf5tvt33rn32wefgujw6pfa2v3ybiai6a When you use $ ipfs dag get --output-codec json /ipfs/bagaaieraev5tujfaql3zxtvubhbuf5tvt33rn32wefgujw6pfa2v3ybiai6a/foo/bar
"hello, world" This is possible thanks to this little bit of code we have that traverses the node after running Lines 48 to 55 in 0f44f29
However, in the gateway, if we use I think we should do the traversal as it will match the current behaviour of [Whatever decision comes out of this must be added to the IPIP for clarity] |
If we want this to work as if its the DAG API then yeah, we should probably go the whole way on supporting pathing, and therefore re-encoding sub-blocks. Is that the intention here (sorry, not being as familiar with the gateway code myself)? Does it make sense to use the DAG API for all of it, or only when you path? Also note that there's subtle differences in pathing between I'm pretty sure UnixFS pathing is the intended behaviour of the gateway, so this would be the default for dag-pb input blocks. Does that make sense even when you change output codec? I guess so 🤷 but we should probably be clear about it since it's one of those weird edges that history provides us with. |
When you request something normally through the gateway, it uses the UnixFS API. This PR adds (DAG-)JSON and (DAG-)CBOR support. We're always using the DAG API unless the codec in the CID is the same as the requested format. In that case, we just print the block bytes. However, that will make it not support pathing. We could, of course, add more ifs_ to make the logic flow through the DAG API if there's paths but I feel that would make this implementation unnecessarily complicated.
Yes, that's how it works.
I think it makes sense. We should, however, note that there's also an IPLD gateway spec going on ipfs/specs#293. I wonder how much of overlap this would have. I think we should keep the same behaviour as
But this will involve not printing the raw bytes if CID codec = requested codec (basically removing these lines). |
Would it be safer if we supported pathing only for traversable (dag-) codecs? I don't think Gateway should be responsible for modifying plain JSON docs that have nothing to do with DAGs, nor have strictness of DAG-JSON. Feels safer: if I am wrong, we can add it in the future, but its easier to add it later than remove somehting we are not happy with. Q: why can't we return raw bytes if CID codec == requested DAG- codec and there is no sub-path? |
We can. I'll add that to the TODOs.
We can. But what if the CID codec is a DAG- codec, but the raw data inside is invalid? E.g., added via |
You are right (I still have brain fog sometimes). (If someone wants better performance, they can skip validation by requesting it as |
Is this for output codecs? So the dividing line on pathing would be if the codec has dag-* in front of it then it'll path? Or is it the division that we support pathing for IPLD codecs—ones that materialise links. The others, which we've called "serializations", like json and cbor, don't allow pathing. Justifying this is a bit tricky though because it seems a bit arbitrary that we do for some codecs and not others. Maybe it's something like: "we support pathing for codecs that materialise links (CIDs) because the most practical application of pathing is to navigate across block boundaries." But note that we could even take that further and only allow pathing where it ends in a link. Still in this arbitrary place of why we support one thing and not another though. |
+1 – my thinking was that the safe way to specify things for now is DAG traversal over CID links (CBOR tag 42), and nothing more. Rationale is that it makes implementer's work easier: can be done with plain CBOR libs if needed, traversing links between blocks is easy to reason about, and does not require any advanced IPLD knowledge. On Gateways we have a bunch of headers and behaviors are based on the CID of the last path segment, and if we start extracting parts of CBOR or JSON documents and return them via pathing, then we need to specify how headers such as My take is that it is easier to add things in the future, than remove, so better to start small. |
.dag- veriants are not necessary – DAG-JSON should have 'Save As' dialog default to file named .json see "File extension" section of specs below: https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-json https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-cbor
Quality of life improvement, allows user agents to render JSON. Passing ?format=json to unixfs directory will show JSON in browser.
Includes human-readable error which encourages people to use CBOR Tag 42 for building DAGs with Links.
This ensures user knows why they got this HTML response (due to specific codec in the specific CID they can inspect), and that they have clear understanding of options – either preview as JSON, or download as one of supported formats.
This is a quick, static placeholder that follows the style of dir-index-html. Future work will refine this, perhaps by reusing parts of https://explore.ipld.io/
Quick update: still in review, I am mostly adding HTTP response tests to safeguard newly added behaviors (headers, caching). I should be done by tomorrow/monday. 🤞 |
This unifies the way we set headers for DAG- responses, adds missing ones for HTML response, and adds bunch of regression tests to ensure desired behaviors are not changed during future refactors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hacdias took this for a spin and looks good. This will have huge impact long term, thank you! 🙏
Pushed some changes around HTTP headers and wrote additional tests:
- JSON response is now inlined, which allows browsers to render preview instead of forcing the "Save As" dialog.
- Unsupported pathing returns more meaningful error + error code 501 Not Implemented
- Unified the way headers are set + added Cache-Control and Etag code
- Added regression tests for some edge cases and common behaviors
- Tweaked HTML response to be more presentable and "good enough" for MVP, Kubo 0.18.0-rc1 and production use on public gateways.
Some additional info comments below.
@@ -142,6 +142,11 @@ test_expect_success "Add CARs for path traversal and DAG-PB representation tests | |||
test_should_contain $DAG_PB_CID import_output | |||
' | |||
|
|||
test_expect_success "GET DAG-JSON with Accept: text/html returns HTML" ' | |||
curl -sD - -H "Accept: text/html" "http://127.0.0.1:$GWAY_PORT/ipfs/$DAG_JSON_TRAVERSAL_CID" > curl_output 2>&1 && | |||
test_should_contain "Content-Type: text/html" curl_output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ℹ️ We were missing Cache-Control and Etag header tests – added them in 7e84856
- unified the way we set headers for DAG- responses, and added missing ones for HTML + bunch of tests
- we needed separate tests for HTML responses because it is a different code path AND caching needs to be different: generated HTML may change in future Kubo, so it is not immutable response, even when comes from /ipfs
- the logic for Cache-Control/Etag for HTML response is similar to ones from dir-index-html responses produced by
gateway_handler_unixfs_dir.go
(seet0116-gateway-cache.sh
for prior examples).
- the logic for Cache-Control/Etag for HTML response is similar to ones from dir-index-html responses produced by
Future work:
- potentially cool project would be to create a single view that handles both unixfs dir listings and non-unixfs dags, and shows more useful info about the DAG (I'm thinking "https://explore.ipld.io 2.0" :)) – not urgent, but somehting we would could start designing in cooperation with gui team.
- if we evern change how dir-index-html responses behave, the analog changes should be applied to dag-index-html.
func (i *gatewayHandler) serveCodecHTML(ctx context.Context, w http.ResponseWriter, r *http.Request, resolvedPath ipath.Resolved, contentPath ipath.Path) { | ||
codecName := mc.Code(resolvedPath.Cid().Prefix().Codec).String() | ||
body := fmt.Sprintf(`<!DOCTYPE html> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ℹ️ Pushed some changes in 0531e93 around the HTML generated here:
- Quick win: reused metadata from
<head>
in https://github.com/ipfs/kubo/blob/master/assets/dir-index-html/dir-index.html – allows for nicer preview when link pasted on Twitter / Discourse etc - Replaced HTML string with proper HTML template (prior art:
redirectTemplate
fromgateway_handler.go
) – easier/safer for future work - Reused static, minified CSS from dir-index-html
It looks like this:
We can improve it further in follow-up PRs, but it is good enough for 0.18.0-rc1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @juliaxbow @SgtPooki – FYI this PR adds a stub/placeholder for HTML returned when a web browser asks gateways for things other than files and directories.
A Potential GUI project for 2023 would be to design a unified experience that can render both directory listings (example) and non-unixfs things like DAG-CBOR / DAG-JSON, and provide more information about DAG.
Something similar to https://explore.ipld.io (example), but less intimidating.
What would be the best place for tracking that? Should I create issue in https://github.com/ipfs/ipfs-gui?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely looks much better than the page I had 😃 Thanks for this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied feedback from IPFS Implementers Sync, HTML should be aimed at non-technical people, and UnixFS in bold is too intimidating.
ipfs/kubo#9335 ipfs/specs#328 Co-authored-by: Marcin Rataj <[email protected]> This commit was moved from ipfs/kubo@fdd1965
Supersedes #8037
Closes #8823
IPIP: ipfs/specs#328
The List (mostly borrowed from #8823)
?format=dag-json
,Accept: application/vnd.ipld.dag-json
?format=dag-cbor
,Accept: application/vnd.ipld.dag-cbor
?format=json
,Accept: application/json
?format=cbor
,Accept: application/cbor