Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add blob protocol to infra #353

Merged
merged 4 commits into from
Apr 18, 2024

Conversation

vasco-santos
Copy link
Contributor

@vasco-santos vasco-santos commented Apr 17, 2024

This PR creates stores and wires up new upload-api running blob/add, web3.storage/blob/allocate, web3.storage/blob/accept and ucan/conclude capabilities. Tests are also imported from upload-api implementation and run here.

As agreed on storacha/w3up#1343 , there won't be any deduping between old world and new world. Therefore, we have new allocations table, and use different key schema in carpark. We are writing blobs keyed as base58btc as previously discussed as ${base58btcEncodedMultihash}/${base58btcEncodedMultihash}.blob. I added .blob suffix but I am happy to other suggestions. Depending on how we progress with the reads side, we can consider creating a new bucket to fully isolate new content?

The receipts and tasks storage end up being more complicated as they need to follow https://github.com/web3-storage/w3infra/blob/main/docs/ucan-invocation-stream.md#buckets, and is essentially the same as what happens on https://github.com/web3-storage/w3infra/blob/main/upload-api/ucan-invocation.js#L66 but at a different level as this is a proactive write of tasks and receipts.

Integration tests are in progress in branch https://github.com/web3-storage/w3infra/tree/feat/add-blob-protocol-to-service-integration-test. They currently run in my local dev setup, and have full functionality from blob/add, followed by HTTP PUT to bucket and ucan/conclude containing the receipt of the put. This makes scheduled blob/accept run and a receipt with a location claim being available. You can see the tests in branch, on file test/blob.test.js. I want to only add them here once we can rely on the client to do the manual hack I did here

@storacha storacha deleted a comment from sentry-io bot Apr 17, 2024
Copy link

seed-deploy bot commented Apr 17, 2024

View stack outputs

@vasco-santos vasco-santos force-pushed the feat/add-blob-protocol-to-service branch from a35d822 to ea4fe9a Compare April 17, 2024 14:40
* @param {string} tableName
* @returns {AllocationsStorage}
*/
export function useAllocationsStorage(dynamoDb, tableName) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite similar with storeTable with few modifications given new columns

* @param {string} bucketName
* @returns {BlobsStorage}
*/
export function useBlobsStorage(s3, bucketName) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite similar with buckets/car-store with few modifications given new key schema

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding this as code comment actually.

@vasco-santos vasco-santos changed the title feat: add blob protocol to service feat: add blob protocol to infra Apr 17, 2024
@vasco-santos vasco-santos marked this pull request as ready for review April 17, 2024 14:44
*/
async schedule(invocation) {
const connection = this.getServiceConnection()
const [res] = await connection.execute(invocation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this write a receipt into receipt storage ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not needed. Sadly this looks then a bit confusing and looking forward to have some kind of receipt storage part of ucanto/scheduler later on. Here this does a HTTP request executing the invocation, which goes through the ucan-invocation-router and is hooked up with the receipt storage: https://github.com/web3-storage/w3infra/blob/main/upload-api/functions/ucan-invocation-router.js#L280

I will add a comment to explain this

space,
multihash: multihash58btc,
size: blob.size,
invocation: invocation.toString(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this just implements what has be previously added to upload-api, however sadly I'm only recognizing this here. It is not a blocker, but perhaps a good idea to do in a followup if we agree.

I am realizing that this should probably link to the signed receipt as opposed to invocation. In a way all our stores should simply be indexes over receipts because receipts are signed commitments from us, also they do link back to the corresponding invocation.

Rational here is that anyone with access to table could create valid record like this, however if it was linking to signed receipt they would not be able to without signing key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point and this looks like it make sense. I will create an issue to address this as follow up because I am not sure on how we can do this in practise given this is blob/add invocation CID. And its receipt link would not be possible to compute when we run blob/allocate given it is also dependent on this receipt itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry I was not very clear here, I meant that it should be receipt to blob/allocate not to receipt of the blob/add

* @param {string} bucketName
* @returns {BlobsStorage}
*/
export function useBlobsStorage(s3, bucketName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding this as code comment actually.

has: async (multihash) => {
const encodedMultihash = base58btc.encode(multihash)
const cmd = new HeadObjectCommand({
Key: `${encodedMultihash}/${encodedMultihash}.blob`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you recall reason why had same named dir and file within it ? I am pretty sure there was one, but I can't seem to recall what it was. If you do could you please add code comment mentioning it so future us can be grateful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is related to how Bucket rate limiting happens (at least in S3), where rate limits are per path, rather than entire bucket. With this, we protect ourselves from rate limiting by having folder names as hashes with unique content inside

ChecksumSHA256: checksum,
ContentLength: size,
})
const expiresIn = 60 * 60 * 24 // 1 day
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should we expose this as an optional argument ? Seems like something we may want to be able to change in some cases.

Copy link
Contributor Author

@vasco-santos vasco-santos Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good call! I will expose this!

vasco-santos added a commit to storacha/w3up that referenced this pull request Apr 18, 2024
Per
storacha/w3infra#353 (comment)
renaming this to cause so that it is transparent for the infra side what
this is: i.e. can be invocation or receipt
@vasco-santos vasco-santos force-pushed the feat/add-blob-protocol-to-service branch from f4b15c6 to dc6f646 Compare April 18, 2024 09:56
@seed-deploy seed-deploy bot temporarily deployed to pr353 April 18, 2024 09:56 Inactive
@vasco-santos vasco-santos merged commit 13566fe into main Apr 18, 2024
3 checks passed
@vasco-santos vasco-santos deleted the feat/add-blob-protocol-to-service branch April 18, 2024 10:12
vasco-santos added a commit that referenced this pull request May 17, 2024
Adds integration tests for blob implementation started on
#353 .

This test aims to test all the main flows that MUST happen once a Blob
is written:
- [x] Blob is written and stored on target
- [x] Blob is written via `w3up-client`
- [x] Blob index is stored on user space
- [x] Blob is retrievable from Roundabout
- [x] Blob blocks are retrievable from Hoverbord
- [x] Blob is retrievable by its root from w3link
- [x] Blob makes its way into Filecoin pipeline after a `filecoin/offer`

In addition, `store-add` tests are added independently to make sure old
clients keep operation for writes + reads!

Follow ups:
- [ ] @vasco-santos to create an issue to create ENV VARs for the
testing names added like `w3s.link` hostname
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants