Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upload/add mutability challenges #56

Open
vasco-santos opened this issue Mar 6, 2023 · 1 comment
Open

upload/add mutability challenges #56

vasco-santos opened this issue Mar 6, 2023 · 1 comment

Comments

@vasco-santos
Copy link
Contributor

Background

While we implement initial version of upload-api there were already a few challenges in the implementations that we need to address and increase complexity, per the decision we had to make upload/add invocation mutable. The upload-api itself can handle multiple upload/add for same root at the moment. However, this might cause some unexpected issues on the reads side of things. Our JS client currently guarantees that only one upload/add is invoked to guarantee that Freeway is able to serve content and is not in a 'unexpected' state where there is shards defined in DUDEWHERE, but there is not enough data to serve content. We do not have a way to address this problem yet!

There is a lot of value in what we want to support here, but maybe there are alternatives that we could add to the protocol that would ease implementations of the protocol, while also making the UCAN invocations more accurate and self describing.

Taking into account a event based world powered by UCAN invocations, the flexibility of upload/add will require us to increase complexity of our event handlers. For instance, UCAN invocation with upload/add are not self describing given the invocation is not enough to tell us if an upload was really created (or was just mutated...). Being able to go through system history just by the events would be super helpful.

Goal

With the continuous challenges posed by this decision, the goal of this issue is to decide if we want to keep same protocol, or if we could make it easier to implement and more self describing.

Protocol change

Protocol currently enables an Upload to be built over time via multiple upload/add invocations. This enables all kind of great use cases. But we can consider modifying it in a way that supports same functionality and makes the whole system coherent.

We can make upload/add only possible to execute once in a space for a given root CID. This allows us to:

  • account for upload/add operations per space/entire system
  • make it a validation step, if shard provided in upload/add is actually in the bucket and validate CAR blocks (we currently do not do this)

To allow a rootCID to be incrementally pointing to more and more shards, we can introduce upload/update that allows new shards to be added and validated.

Finally, introducing a upload/ready that actually flags that this content is now ready to be read would enable reads side to operate safely, Alternatively, we can also decide that this is redundant and just flag to the users that an incomplete upload can be problematic to serve in reads side.

Alternatives

There are alternatives we can explore that could also be part of the spec.

Receipts

We can make part of the spec that after an initial upload/add is made in a space, the system should generate a receipt of upload creation that both user and UCAN Invocations Stream could see and act upon as they like.

This flow can actually be useful for other things where UCAN invocation might not give all the details, such as store/remove not having size.

Increase system complexity

Or we can just keep extra state, or query the existing databases to check state. Querying existing databases can lead into race conditions quite easily and should be avoided. Keeping more state is also more expensive as more DB writes would need to happen just for accounting uniques.

In addition, this also means keeping a more complex system and more code to maintain.

@alanshaw
Copy link
Member

alanshaw commented Mar 6, 2023

The response from upload/add is the record with the new data https://github.com/web3-storage/w3infra/blob/main/upload-api/tables/upload.js#L100.

When we implement receipts we should be able to run the metrics off the receipt for this call. If the input set is the same as the set in the receipt it's probably a new upload/add. This would be more accurate than what we have atm but not perfect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants