Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: adds support for gRPC streaming and removes dependency on RPC #3915

Open
wants to merge 34 commits into
base: feature/api-breaks
Choose a base branch
from

Conversation

rach-id
Copy link
Member

@rach-id rach-id commented Nov 4, 2024

The Celestia-node part of celestiaorg/celestia-core#1513

@rach-id rach-id self-assigned this Nov 4, 2024
@github-actions github-actions bot added the external Issues created by non node team members label Nov 4, 2024
@rach-id
Copy link
Member Author

rach-id commented Nov 4, 2024

PR still not fully ready, some tests are still failing. but would be good to get initial opinion + if someone can see if it's providing you with any performance increase

Comment on lines +42 to +45
case <-f.doneCh:
return nil
case <-ctx.Done():
return fmt.Errorf("fetcher: unsubscribe from new block events: %w", ctx.Err())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question for reviewers]
not sure if this is the right way to stop the resources in here. I guess we can delete this part. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth keeping. It might look unnecessary, but it will actually prevent flakes for tests, so lets keep it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about closing the underlying connection? What's responsible for that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. I can support closing the connection if we make the Client a struct. would that be something we want?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rach-id rach-id marked this pull request as draft November 4, 2024 13:56
Comment on lines 79 to 84
fx.OnStart(func(_ context.Context, client core.Client) error {
return client.Start()
}),
fx.OnStop(func(_ context.Context, client core.Client) error {
return client.Stop()
}),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note for reviewers]
with the new implementation, the gRPC connection is started when creating the client instead of waiting until we call Start()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but we also need to close conn when node stops.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go.mod Show resolved Hide resolved
go.mod Outdated
github.com/filecoin-project/dagstore => github.com/celestiaorg/dagstore v0.0.0-20230824094345-537c012aa403
github.com/gogo/protobuf => github.com/regen-network/protobuf v1.3.3-alpha.regen.1
// broken goleveldb needs to be replaced for the cosmos-sdk and celestia-app
github.com/syndtr/goleveldb => github.com/syndtr/goleveldb v1.0.1-0.20210819022825-2ae1ddf74ef7
github.com/tendermint/tendermint => github.com/celestiaorg/celestia-core v1.43.0-tm-v0.34.35
github.com/tendermint/tendermint => github.com/celestiaorg/celestia-core v1.43.0-tm-v0.34.35.0.20241104132041-3d2b32b7ddf2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note for reviewers]
will be updated once we merge the PRs and make release

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same here

@codecov-commenter
Copy link

codecov-commenter commented Nov 5, 2024

Codecov Report

Attention: Patch coverage is 53.33333% with 119 lines in your changes missing coverage. Please review.

Please upload report for BASE (feature/api-breaks@4a90605). Learn more about missing BASE report.

Files with missing lines Patch % Lines
core/fetcher.go 51.32% 62 Missing and 30 partials ⚠️
core/client.go 54.54% 13 Missing and 2 partials ⚠️
core/exchange.go 71.42% 2 Missing ⚠️
core/listener.go 33.33% 2 Missing ⚠️
nodebuilder/core/config.go 50.00% 2 Missing ⚠️
nodebuilder/core/constructors.go 0.00% 2 Missing ⚠️
nodebuilder/core/module.go 0.00% 2 Missing ⚠️
nodebuilder/core/opts.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@                  Coverage Diff                  @@
##             feature/api-breaks    #3915   +/-   ##
=====================================================
  Coverage                      ?   45.29%           
=====================================================
  Files                         ?      307           
  Lines                         ?    21923           
  Branches                      ?        0           
=====================================================
  Hits                          ?     9929           
  Misses                        ?    10908           
  Partials                      ?     1086           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@rach-id rach-id marked this pull request as ready for review November 5, 2024 09:36
@vgonkivs
Copy link
Member

conflicts cc @rach-id

@rach-id
Copy link
Member Author

rach-id commented Nov 11, 2024

@vgonkivs Fixed 🙏 Would be good to know if this PR is good to go so that we start merging the changes in core/comsos-sdk

@rach-id rach-id changed the base branch from main to feature/api-breaks November 12, 2024 07:06
@vgonkivs vgonkivs added kind:break! Attached to breaking PRs kind:feat Attached to feature PRs labels Nov 12, 2024
@vgonkivs
Copy link
Member

vgonkivs commented Nov 13, 2024

Tested locally. Both grpc clients work

@@ -123,7 +123,7 @@ func (ca *CoreAccessor) Start(ctx context.Context) error {
ca.ctx, ca.cancel = context.WithCancel(context.Background())

// dial given celestia-core endpoint
endpoint := fmt.Sprintf("%s:%s", ca.coreIP, ca.grpcPort)
endpoint := fmt.Sprintf("%s:%s", ca.coreIP, ca.port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core/fetcher.go Outdated

// resolveHeight takes a height pointer and returns its value if it's not nil.
// otherwise, returns the latest height.
func (f *BlockFetcher) resolveHeight(ctx context.Context, height *int64) (int64, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[non-blocking]
Passing a pointer for an int is really an anti-pattern, which existed in the previous client. I guess you kept it for the case where the nil is passed which means get the latest, but we don't need to stick to that pattern, and can use something else, like a separate method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I tried to keep the same API. I don't like passing pointers to numbers too. Do you think this should be updated in this PR? or we create an issue and leave it later on for someone to pick it up at some point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +42 to +45
case <-f.doneCh:
return nil
case <-ctx.Done():
return fmt.Errorf("fetcher: unsubscribe from new block events: %w", ctx.Err())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth keeping. It might look unnecessary, but it will actually prevent flakes for tests, so lets keep it.

@@ -173,12 +173,12 @@ func (ce *Exchange) getExtendedHeaderByHeight(ctx context.Context, height *int64
}
log.Debugw("fetched signed block from core", "height", b.Header.Height)

eds, err := extendBlock(b.Data, b.Header.Version.App)
eds, err := extendBlock(*b.Data, b.Header.Version.App)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we already know, Data is quite big, let's avoid copying it here. It should be easy to change extendBlock to take ptr instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be GC at some point. The thing is extendBlock also makes a couple other calls by value instead of reference. So multiple copies will be done regardless, this will be just one less. What do you think of creating an issue that checks wherever Data is used, it passes it by reference, even in app and maybe other repos so that we have everything fixed?

if !f.client.IsRunning() {
return nil, errors.New("client not running")
}

ctx, cancel := context.WithCancel(ctx)
f.cancel = cancel
f.doneCh = make(chan struct{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend fool-proving this a bit. Basically, the current logic supports only a single subscription, while the method makes it look like multiple are and calling it multiple times is a bug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the same existing pattern. What do you mean by fool-proving?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant making erroring or panicking if subsvription is called again

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d4243fa like this?

core/fetcher.go Outdated
if height != nil {
return *height, nil
}
status, err := f.client.Status(ctx, &coregrpc.StatusRequest{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a way to request the latest height through existing methods? Let's avoid additional roundtrip for Head request if possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like 0 on BlockByHeightRequest can represent latest height.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I recommended to just use 0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im fine with changing and removing the pointers. Is this something you want to do in this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes pls, we would like to leave as minimal tech debt behind as possiblr

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core/testing.go Show resolved Hide resolved
core/client.go Outdated
"/websocket",
httpClient.StandardClient(),
)
return coregrpc.StartBlockAPIGRPCClient(fmt.Sprintf("tcp://%s:%s", ip, port))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse the client across state and core modules?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'm planning to do it in a subsequent PR

Comment on lines +254 to +260
func receiveBlockByHeight(streamer coregrpc.BlockAPI_BlockByHeightClient) (
*types.Block,
*types.BlockMeta,
*types.Commit,
*types.ValidatorSet,
error,
) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return of 5 values caught my eye. I can't find if BlockMeta returned from this method is used anywhere. If not, perhaps we can return just SignedBlock struct, which contains everything except Blockmeta?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am returning everything we receive in this. If you don't want block meta to be returned, I guess we could even stop sending it from core. What do you think?

core/fetcher.go Outdated
Comment on lines 222 to 226
signedBlock, err := f.GetSignedBlock(ctx, &resp.Height)
if err != nil {
log.Errorw("fetcher: error receiving signed block", "height", resp.Height, "err", err.Error())
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add per-call timeout. Otherwise we might get stuck on one of the calls indefinitely

Also would need to handle canceled context here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@renaynay renaynay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for your efforts here @rach-id !

This PR made me think about #3952 which would allow us to only rely on GetSignedBlock endpoint in fetcher and remove all the other unnecessary fluff (like GetBlockInfo etc.

core/fetcher.go Outdated

// resolveHeight takes a height pointer and returns its value if it's not nil.
// otherwise, returns the latest height.
func (f *BlockFetcher) resolveHeight(ctx context.Context, height *int64) (int64, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit]: Wouldn't this be a bit more simple if we just used 0 height to indicate Head instead of nil?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -173,12 +173,12 @@ func (ce *Exchange) getExtendedHeaderByHeight(ctx context.Context, height *int64
}
log.Debugw("fetched signed block from core", "height", b.Header.Height)

eds, err := extendBlock(b.Data, b.Header.Version.App)
eds, err := extendBlock(*b.Data, b.Header.Version.App)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we passing by value here? Just curious

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because extendBlock expects the parameter to be passed by value. I didn't want to change that behavior since Data is already being copied in a lot of places, one more copy is fine. Related: #3915 (comment)

Comment on lines +145 to +146
if err != nil && ctx.Err() == nil {
require.NoError(t, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just require.NoError? we do not expect one here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a funny behavior happening there that I guess will be fixed if we support closing the connection manually. Related: #3915 (comment)

Comment on lines +42 to +45
case <-f.doneCh:
return nil
case <-ctx.Done():
return fmt.Errorf("fetcher: unsubscribe from new block events: %w", ctx.Err())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about closing the underlying connection? What's responsible for that?

Comment on lines 79 to 84
fx.OnStart(func(_ context.Context, client core.Client) error {
return client.Start()
}),
fx.OnStop(func(_ context.Context, client core.Client) error {
return client.Stop()
}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but we also need to close conn when node stops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external Issues created by non node team members kind:break! Attached to breaking PRs kind:feat Attached to feature PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants