Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2] Node DownloadBatch method #880

Merged
merged 1 commit into from
Nov 13, 2024

Conversation

ian-shim
Copy link
Contributor

@ian-shim ian-shim commented Nov 9, 2024

Why are these changes needed?

Implements a utility method for downloading a batch from relays.

Checks

  • I've made sure the lint is passing in this PR.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
  • I've checked the new test coverage and the coverage percentage didn't drop.
  • Testing Strategy
    • Unit tests
    • Integration tests
    • This PR is not tested :(

Comment on lines 18 to 20
type BlobShard struct {
BlobCertificate
Chunks map[core.QuorumID][]*encoding.Frame
*BlobCertificate
Chunks map[core.QuorumID]core.Bundle
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mooselumph can you comment if you like this change?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me!

@ian-shim
Copy link
Contributor Author

ian-shim commented Nov 9, 2024

This PR depends on #875

@ian-shim ian-shim marked this pull request as ready for review November 9, 2024 03:01
@ian-shim ian-shim requested review from anupsv and jianoaix November 9, 2024 03:05
@ian-shim ian-shim force-pushed the node-v2-store-chunks branch 5 times, most recently from ab183e3 to 3835f68 Compare November 13, 2024 00:45
// GetBatchHeader constructs a core.BatchHeader from a proto of pb.StoreChunksRequest.
// Note the StoreChunksRequest is validated as soon as it enters the node gRPC
// interface, see grpc.Server.validateStoreChunkRequest.
func BatchHeaderFromProtobuf(in *pb.BatchHeader) (*BatchHeader, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: BatchHeaderFromProto to be consistent (or BlobHeaderFromProtobuf below if full spell is preferred)

core/v2/types.go Outdated
@@ -211,6 +211,77 @@ type Batch struct {
BlobCertificates []*BlobCertificate
}

func (b *Batch) ToProtobuf() (*commonpb.Batch, error) {
if b.BatchHeader == nil {
return nil, fmt.Errorf("batch header is nil")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: here and below, errors.New since there is not formatting param

BlobCertificate
Chunks map[core.QuorumID][]*encoding.Frame
*BlobCertificate
Bundles map[core.QuorumID]core.Bundle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bundles core.Bundles?

node/node_v2.go Outdated
Bundles map[core.QuorumID][]byte
}

func (n *Node) DownloadBatch(ctx context.Context, batch *corev2.Batch) ([]*corev2.BlobShard, []*RawBundles, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks odd to supply the batch as input parameter, while the goal is to produce (download) a batch (unless the parameter is also used to bring out the output which isn't the case here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the confusion. I'll rename this method to DownloadBundles. Is that more clear?

}
pool.StopWait()

for i := 0; i < len(requests); i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pool may send less than len(requests) responses to the channel as it exits on error (L103), so this may hang/stuck the server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. Fixed!

}
type relayRequest struct {
chunkRequests []*clients.ChunkRequestByRange
metadata []*requestMetadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks not metadata, but an integral part of the request itself (without quorum ID, the request is incomplete). That is, a well defined request should have blobkey, chunk index range and quorum ID all together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is metadata for keeping track of the requests, i.e. the actual request to relay doesn't require quorum IDs, just blob key and indices. This metadata is used on the client side (node) to put the responses in the right place.

node/grpc/server_v2.go Outdated Show resolved Hide resolved
node/node_v2.go Outdated
Bundles map[core.QuorumID][]byte
}

func (n *Node) DownloadBatch(ctx context.Context, batch *corev2.Batch) ([]*corev2.BlobShard, []*RawBundles, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems making robust retry logic for this method could become pretty complex. Are you thinking of doing this as a follow-on/post-MVP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about not retrying at all (at least for the MVP) and returning error immediately if download fails. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is node software that can't be upgraded as swiftly, I wonder if it may be worth thinking about whether we want to build in some kind of robustness logic from the initial release. 🤔 But definitely as a follow-up after we have the system running (maybe that's your point)

node/node_v2.go Show resolved Hide resolved
node/node_v2.go Show resolved Hide resolved
node/node_v2.go Show resolved Hide resolved
@ian-shim ian-shim force-pushed the node-v2-store-chunks branch 3 times, most recently from 4a32367 to f043837 Compare November 13, 2024 02:25
@ian-shim ian-shim force-pushed the node-v2-store-chunks branch from f043837 to 1c4d6e7 Compare November 13, 2024 02:28
@ian-shim ian-shim merged commit 72357a9 into Layr-Labs:master Nov 13, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants