Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementations of the TPU Vortexor #3258

Open
wants to merge 30 commits into
base: master
Choose a base branch
from

Conversation

lijunwangs
Copy link

@lijunwangs lijunwangs commented Oct 22, 2024

Problem

Framework for Solana TPU vortexor which allows core validator to offload TPU receiving, sigverify and deduplications.

Summary of Changes

This is the first installment of the change. This is the framework part. It starts the TPU and forward stream based on the CLI configurations. In future PRs we will have the sigverify, forwarder, subscription management. The code has no impact to the existing validator.

Added a unit test for starting the vortexor, send some transactions and stops it.

Fixes #

@lijunwangs lijunwangs marked this pull request as draft October 22, 2024 09:23
@lijunwangs lijunwangs marked this pull request as ready for review October 26, 2024 20:47
@@ -239,3 +245,46 @@ pub async fn make_client_endpoint(
.await
.expect("Test server should be already listening on 'localhost'")
}

pub async fn check_multiple_streams(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved from nonblocking::quic::test to here to be reused for other testing

@behzadnouri
Copy link

Is there a design document or anything for this?
like what is the motivation and end goal here? how is the validator going to talk with this new binary? how does the binary obtain stakes to do stake-based QoS etc? does the binary need access to the private key of the validator? etc ...

@lijunwangs
Copy link
Author

lijunwangs commented Dec 2, 2024

Is there a design document or anything for this? like what is the motivation and end goal here? how is the validator going to talk with this new binary? how does the binary obtain stakes to do stake-based QoS etc? does the binary need access to the private key of the validator? etc ...

The motivation is to make TPU streaming, sig verification more scalable. The readme captures this: https://raw.githubusercontent.com/anza-xyz/agave/11908fbbcfab7194f954177c2da0646249ee308a/vortexor/Readme.md. I have set up a review meeting to go over this to give more context. The vortexor will retrieve the stake map via regular RPC service from configured validator -- the RPC can be either pairing validator or any other trusted RPC validator in the network supporting the RPC pub/sub service. The Vortexor does not need the private key to the validator.

@behzadnouri
Copy link

The Vortexor does not need the private key to the validator.

don't we need the private key for QUIC handshake ?

@lijunwangs
Copy link
Author

The Vortexor does not need the private key to the validator.

don't we need the private key for QUIC handshake ?

A private key is used to sign the TLS certificate. But the client does not care about if the key is the same as the one of the validator advertising the TPU address (The vortexor address). The client code does not check that and we have no use case to enforce that check -- this is the same currently using --public-tpu-address. Without requiring using the same private key making it more secure and simplify deployment.

@behzadnouri
Copy link

A private key is used to sign the TLS certificate. But the client does not care about if the key is the same as the one of the validator advertising the TPU address (The vortexor address). The client code does not check that and we have no use case to enforce that check

How does the client prune its connection cache when it becomes too big?
Isn't that by pubkey => stake mapping? and isn't the pubkey from QUIC handshake?

@lijunwangs
Copy link
Author

A private key is used to sign the TLS certificate. But the client does not care about if the key is the same as the one of the validator advertising the TPU address (The vortexor address). The client code does not check that and we have no use case to enforce that check

How does the client prune its connection cache when it becomes too big? Isn't that by pubkey => stake mapping? and isn't the pubkey from QUIC handshake?

We randomly select a connection to prune -- the current connection cache is agnostic of the stake map.

@lijunwangs
Copy link
Author

@pgarg66 @behzadnouri @bw-solana @sakridge -- can you follow up on this PR request? I have updated the readme.md with some of feedback I got in the table top review. Thanks

@pgarg66
Copy link

pgarg66 commented Dec 12, 2024

Hey @lijunwangs, could you post the comments/summary from the table-top review? It'll be good to have them captured here in this PR.

Also, there were some action items based on the review. It'll be great to have tracking issues for those, so that reviewers know what's coming down the line for Vortexor.

@bw-solana
Copy link

I don't have a ton of feedback on the code for now, but a few comments (most of which are just reiterating what came up during live discussion):

  • I think it would be good to map out how Jito relayers and DZ networks might interface with the vortexor
  • It would be good to call out if vortexor and core validator can be run on the same box
  • It would be good to provide some details on how operators can upgrade from current validator to vortexor + core validator configuration.
  • It would be good to specify details on how core validator ingest might work. Is this connection provided all of the bandwidth by default? Can core validator also ingest from other sources? How is throttling handled when dealing with multiple sources of ingress (e.g. multiple vortexors and public TPU connections)?

@lijunwangs
Copy link
Author

I don't have a ton of feedback on the code for now, but a few comments (most of which are just reiterating what came up during live discussion):

  • I think it would be good to map out how Jito relayers and DZ networks might interface with the vortexor
  • It would be good to call out if vortexor and core validator can be run on the same box
  • It would be good to provide some details on how operators can upgrade from current validator to vortexor + core validator configuration.
  • It would be good to specify details on how core validator ingest might work. Is this connection provided all of the bandwidth by default? Can core validator also ingest from other sources? How is throttling handled when dealing with multiple sources of ingress (e.g. multiple vortexors and public TPU connections)?

Thanks for the comments. Updated the Readme.md with this.

@lijunwangs
Copy link
Author

Hey @lijunwangs, could you post the comments/summary from the table-top review? It'll be good to have them captured here in this PR.

Also, there were some action items based on the review. It'll be great to have tracking issues for those, so that reviewers know what's coming down the line for Vortexor.

I have put the discussion point we had into the Readme.md -- as I think it is a design questions. I have created a project to track these efforts:

https://github.com/orgs/anza-xyz/projects/15/views/1

vortexor/Readme.md Outdated Show resolved Hide resolved
Copy link

@pgarg66 pgarg66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions/comments.

In its current form it looks like a wrapper on top of part of existing QUIC and TPU functionality. The follow on PRs would add more specifics for Vortexor.

It looks in the right direction to me, but I would like another person to review it as well.

vortexor/src/vortexor.rs Outdated Show resolved Hide resolved
vortexor/src/vortexor.rs Outdated Show resolved Hide resolved
vortexor/src/vortexor.rs Outdated Show resolved Hide resolved
vortexor/src/vortexor.rs Show resolved Hide resolved
vortexor/src/vortexor.rs Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants