Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gateway: remote car gateway backend #587

Merged
merged 19 commits into from
Apr 13, 2024
Merged

gateway: remote car gateway backend #587

merged 19 commits into from
Apr 13, 2024

Conversation

hacdias
Copy link
Member

@hacdias hacdias commented Mar 7, 2024

Closes #576. Generally, this imports the graph gateway, here named CarBackend, from our old bifrost-gateway repository. The following updates were also done:

  • Run Conformance Tests against the Remote Car Gateway
  • Re-use options between CarBackend and BlocksBackend
  • Update CarBackend such that CARs return useful blocks when paths are not not resolvable

Depends on:

@hacdias hacdias self-assigned this Mar 8, 2024
@hacdias hacdias changed the title Gateway implementations gateway: remote car implementation Mar 8, 2024
@hacdias hacdias changed the title gateway: remote car implementation gateway: remote backend implementations Mar 8, 2024
@hacdias hacdias force-pushed the gateway-implementations branch 2 times, most recently from e328b5b to 55f0d3a Compare March 25, 2024 12:40
@hacdias hacdias changed the title gateway: remote backend implementations gateway: car backend Mar 25, 2024
@hacdias hacdias changed the title gateway: car backend gateway: car backend implementation Mar 25, 2024
@hacdias hacdias changed the title gateway: car backend implementation gateway: remote graph backend implementation Mar 25, 2024
@hacdias hacdias changed the base branch from main to gateway-remote-blocks March 25, 2024 16:30
Copy link

codecov bot commented Mar 25, 2024

Codecov Report

Attention: Patch coverage is 52.14541% with 803 lines in your changes are missing coverage. Please review.

Project coverage is 59.65%. Comparing base (63c3ec6) to head (32ddd92).

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #587      +/-   ##
==========================================
+ Coverage   59.52%   59.65%   +0.13%     
==========================================
  Files         233      238       +5     
  Lines       28276    29810    +1534     
==========================================
+ Hits        16830    17782     +952     
- Misses       9977    10415     +438     
- Partials     1469     1613     +144     
Files Coverage Δ
examples/gateway/car-file/main.go 11.53% <ø> (ø)
examples/gateway/proxy-blocks/main.go 0.00% <0.00%> (ø)
gateway/handler_unixfs_dir.go 64.84% <50.00%> (ø)
gateway/errors.go 84.25% <37.50%> (+0.91%) ⬆️
gateway/value_store.go 0.00% <0.00%> (ø)
gateway/blockstore.go 0.00% <0.00%> (ø)
gateway/backend_car_fetcher.go 78.72% <78.72%> (ø)
gateway/backend_car_traversal.go 72.72% <72.72%> (ø)
examples/gateway/proxy-car/main.go 0.00% <0.00%> (ø)
gateway/backend_blocks.go 44.71% <40.00%> (ø)
... and 3 more

... and 16 files with indirect coverage changes

@hacdias
Copy link
Member Author

hacdias commented Mar 26, 2024

@aschmahmann there are quite some tests failing for Gateway Conformance (Graph Gateway). I've already fixed some tests but maybe you have a better intuition for what might be failing here. I had to update the code quite a bit considering all the updates we've had in Boxo.

gateway/backend_graph.go Outdated Show resolved Hide resolved
Base automatically changed from gateway-remote-blocks to main April 10, 2024 08:19
@hacdias
Copy link
Member Author

hacdias commented Apr 10, 2024

We have now 10 failing conformance tests with the remote CAR gateway backend. I've managed to solve the other ones in the meanwhile.

Cache-Control: only-if-cached

This are currently failing because the Car Gateway does not have a local blockstore and does not cache anything. Therefore, we always return false to IsCached.

I think we should disable this test for this backend.

Range requests for DAG-CBOR converted to HTML

And this is a very interesting one. For the range requests, we initially do a request without a range and then accept either the correct range, the first range, or the whole content as response. But here's where it gets tricky!

When converting to HTML, Boxo does some magic to nicely display DAG-CBOR and DAG-JSON. When we have the whole data, we do it:

image

However, the Car Backend fetches a CAR with only the necessary blocks from the range request. Therefore, it does not have the necessary data to pretty print the DAG into HTML format:

image

And this makes the test fail (content of range request does not match range, nor the whole request).

I don't have a clear solution here: we could potentially try doing resolution first, and then acting depending on the codec of the CID. If the codec is not dag-pb and request accepts text/html (information currently not passed into .Get), then fetch everything without ranges. But this creates another problem: we'd be doing two requests and potentially more data than necessary for exotic formats.

Wdyt? I temporarily solved it with 6fe39c8 by adding a small exception.

cc @aschmahmann @lidel

@hacdias hacdias force-pushed the gateway-implementations branch 2 times, most recently from 5a2b6c8 to 6fe39c8 Compare April 10, 2024 15:07
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for salvaging this work @hacdias ❤️

On only-if-cached

Disabling tests for backend_car" for now sgtm.

💭 Click for Rationale

In theory, we could wire it up to in-memory cache, and pre-warm the cache so it passes the test, but it feels like a time sink not worth the effort.

In practice, Cache-Control: only-if-cached woudl be handled by CDN/cache in front of gateway, so it is reasonable for gateway backed by a remote backend to always return HTTP 412. HTTP 200s will be returned only by the cache that sits in front of it.


On Range requests for DAG-CBOR converted to HTML

Thank you for surfacing this one. Took me a moment to understand what is happening here.

The way I see it, the conformance test of (dag-cbor + Accept with text/html + Range) in path_gateway_dag_test.go#L718-L731 makes no sense conformance-wise, and should be removed from conformance suite itself, and remove any hacks we added to make it pass.

Lmk if below missed the mark, but I think removing this edge case is the best way of handling it.

💭 Click for (long) Rationale why

Making HTTP Range request with Accept that includes text/html make little sense.
The only meaningful Range tests for dag-cbor are ones that fetch subset of CBOR bytes.
So if Accept is missing, */* or application/vnd.ipld.dag-cbor application/vnd.ipld.cbor.
The case where Accept is text/html is not meaningful.

This HTML is generated, getting subset of bytes of generated HTML preview page, while technically possible for HTTP client, is.. meh? Not useful at all, not related to receiving actual content-addressed data.

It would be a-okay regression test in Kubo, but should not be part of conformance.

I think the reason it ended up in conformance tests is that our approach to conformance was to play it safe and "better to test too much than too little". Which is good, and I still think was a good call.

When it comes to conformance, we have two opposite problems:

  1. We test Kubo-specific behaviors. As we have implementations, we identify these, and over time relax, make generic, or remove.
  2. We have no tests for implicit behaviors related to generic HTTP due to always using net/http from Golang

Hopefully this is useful context.
Over time, conformance tests will improve. The more we use them, the better.

examples/gateway/proxy-car/README.md Show resolved Hide resolved
examples/gateway/proxy-car/README.md Outdated Show resolved Hide resolved
examples/gateway/proxy-blocks/README.md Outdated Show resolved Hide resolved
examples/gateway/proxy-car/main.go Outdated Show resolved Hide resolved
examples/gateway/proxy-car/main.go Outdated Show resolved Hide resolved
@hacdias
Copy link
Member Author

hacdias commented Apr 11, 2024

@lidel great, I do agree. I opened a PR to remove the conformance test (ipfs/gateway-conformance#202) and I will revert the change here that I made in order to pass such test. We'll need to make a release of the conformance tests once merged.

@hacdias
Copy link
Member Author

hacdias commented Apr 11, 2024

Besides removing the conformance test (PR here), this PR is ready to be reviewed. I just made a few cleanups, so here's a short summary of the most important:

  • You can now override the http.Client used by remote fetchers where we were always using the default newRemoteHTTPClient.
  • You can customize the GetFetchTimeout in the CarBackend
  • I made it clear that the CarBackend is not strictly remote. Just like the BlocksBackend, we can use it with different CarFetchers.
  • I made the utility NewRemoteCarBackend that creates a CarBackend with a RemoteCarFetcher.
  • I updated te CarFetcher interface to be more generic (path + CarParams) instead of the URL directly, making it more flexible.

@hacdias hacdias marked this pull request as ready for review April 11, 2024 15:14
@hacdias hacdias requested a review from a team as a code owner April 11, 2024 15:14
@hacdias
Copy link
Member Author

hacdias commented Apr 11, 2024

Note that there is one test that is "expected" [by github actions] (gateway-conformance). This needs to be updated in the configuration of the repository after merging this PR, since we updated the name of the test.

@lidel
Copy link
Member

lidel commented Apr 11, 2024

@hacdias thanks!

FYSA I've removed "gateway-conformance" workflow from required list for now.

Since boxo/gateway is de facto reference implementation of what is possible, gateway-conformance.yml workflow from this repo will be read by people who want to see prior art, learn how to integrate conformance tests into their projects.

I took a stab at cleaning it up + adding the missing test for "remote block backend" (where blocks are fetched one by one via ?format=raw) in 2ce7502.

(I've run out of time, will try to finish reviewing this PR tomorrow)

Comment on lines +34 to 35
github.com/ipld/go-car v0.6.2
github.com/ipld/go-car/v2 v2.13.1
Copy link
Member

@lidel lidel Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 i guess we have to live with both now.

iiuc we need v1 for WithErrorOnEmptyRoots which was introduced in ipld/go-car#461, and v2 does not support everything v1 does, and v3 mentioned in ipld/go-car#461 (comment) never happened.

Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for porting and cleaning this up @hacdias!

I think it is good enough: we have examples that show how to use it, and we dogfood both backends in conformance tests, which is a neat way to ensure they don't break without notice.

Lgtm, let's merge it.

For anyone following this work: the next step will be to expose support for block and car proxy modes in rainbow (ipfs/rainbow#88).

@hacdias hacdias merged commit a26b503 into main Apr 13, 2024
16 checks passed
@hacdias hacdias deleted the gateway-implementations branch April 13, 2024 08:36
wenyue pushed a commit to wenyue/boxo that referenced this pull request Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: 🎉 Done
Development

Successfully merging this pull request may close these issues.

gateway: remote backend implementations from bifrost-gateway
2 participants