Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a more performant client to bench-tps #544

Closed

Conversation

KirillLykov
Copy link

@KirillLykov KirillLykov commented Apr 2, 2024

Problem

Motivation is described in #415

Don't like the name HighTpsClient, but for now use it as working name.

It is also possible to modify TpuClient to be more performant by using runtime::spawn instead of runtime::invoke (or equivalent that we use there which is tokio::task::block_in_place(move || self.rpc_client.runtime().block_on(f))).
But in this case we cannot handle errors. Another way, might be to implement another method TpuClient::send_batch_xxx which will use spawn internally. see discussion

Summary of Changes

Add a new client which is supposed to be more performant than TpuClient for bench-tps needs.

Test runs results

  1. With private cluster (distributed among 2 data centers, 10 nodes).
client TPS+-std
TpuClient 5407+-3k
UDP connection 21823+-4.8k
HighTpsClient 10292+-6.2k

The identity of the node we use has 10% of total stake.
Would like to check if we can achieve better result with several identities to be close to UDP case.

TpuClient:
Screenshot 2024-04-02 at 18 11 36

HighTpsClient:
Screenshot 2024-04-02 at 18 12 30

  1. With testnet:

#415 (comment)

Experiments setup

-u http://$IP:8899 --identity dos-funder.json --read-client-keys keypairs.yaml --duration 600 --tx_count 5000 --thread-batch-sleep-ms 10 --client-node-id invalidator/identity.json --bind-address $IP --block-data-file block.csv --threads 4 --tpu-connection-pool-size 4 --use-high-tps-client --sustained

@codecov-commenter
Copy link

codecov-commenter commented Apr 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.8%. Comparing base (55c05c5) to head (4b45bb8).
Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master     #544     +/-   ##
=========================================
- Coverage    81.8%    81.8%   -0.1%     
=========================================
  Files         851      851             
  Lines      230165   230172      +7     
=========================================
- Hits       188464   188460      -4     
- Misses      41701    41712     +11     

@KirillLykov KirillLykov force-pushed the klykov/add-high-tps-client branch from a86fb92 to 4b45bb8 Compare April 5, 2024 15:55
@KirillLykov KirillLykov marked this pull request as ready for review April 5, 2024 15:55
@@ -27,6 +27,8 @@ pub enum ExternalClientType {
// Submits transactions directly to leaders using a TpuClient, broadcasting to upcoming leaders
// via TpuClient default configuration
TpuClient,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to add comment here to detail where we send transactions (direct to leaders) and short summary of difference w/ respect to TpuClient

Copy link

@bw-solana bw-solana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM

@KirillLykov
Copy link
Author

Overall, LGTM

i'm not sure if it is better to:

  1. create a new client (as it is)
  2. modify TpuClient to use spawn instead
  3. in bencht-tps use async so that we can call spawn (might be more time consuming to implement)

@KirillLykov
Copy link
Author

@ilya-bobyr suggestions about the name of the class? I don't really like HighTpsClient, would prefer BenchTpsClient which unfortunately is the name of the trait already. Using the same name for trait and struct might be not the best practice as far as I understood

Copy link

@gregcusack gregcusack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good for the most part! the numbers speak for themselves! just a couple questions/nits

.arg(
Arg::with_name("high_tps_client")
.long("use-high-tps-client")
.conflicts_with("rpc_client")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a .conflicts_with("tpu_client") here?

rpc_client,
websocket_url,
HighTpsClientConfig {
fanout_slots: 1,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

were your experiments run with fanout_slots: 1 and any reason you landed on 1? Does it make sense to make configurable via CLI? The code in high_tps_client.rs makes it seem like it should be configurable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrote in the other comment my thoughts about that. Regarding configuration in general, I would try to decrease the number of configurable parameters to simplify the ux of the tool. If we know some optimal parameters, I would prefer to hardcode them to decrease the mental to understand how to configure them. Currently, we have: number of threads, tx count, size of CC, sustained/not, timeout, ... I think we should just find one optimal set of parameters because it most of these are never touched. And if they are changed, it might mean that the defaults are not the most performant.

use {
crate::bench_tps_client::{BenchTpsClient, BenchTpsError, Result},
solana_client::{
connection_cache::Protocol, nonblocking::tpu_client::LeaderTpuService,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you import this directly from solana_tpu_client::nonblocking::tpu_client::LeaderTpuService? sounds like we're eventually going to get rid of the solana_client according to Tyera


fn send_batch(&self, transactions: Vec<Transaction>) -> Result<()> {
let wire_transactions = transactions
.into_iter() //.into_par_iter() any effect of this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in response to comment here: probably if Vec<Transaction> is long. tpu_client uses .into_par_iter() so may make sense to use into_par_iter() but i haven't tested it to see performance diff

.map(|tx| bincode::serialize(&tx).expect("transaction should be valid."))
.collect::<Vec<_>>();
for c in wire_transactions.chunks(self.config.send_batch_size) {
let tpu_addresses = self

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible rename tpu_addresses to leader_tpu_addresses??

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually context may clear enough.

let tpu_addresses = self
.leader_tpu_service
.leader_tpu_sockets(self.config.fanout_slots);
for tpu_address in &tpu_addresses {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only sending to one leader here right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be to fanout_slots leaders.
In configuration that I specify 1 for this parameter because I think that we don't care how many txs have been timed out but only interested in maximazing TPS. Due to that it makes sense to utilize NIC bandwidth to send as many datagrams to one leader as possible (although it might be that for particular cluster configuration we cannot fully utilize it due to staked congestion control)

Copy link

@lijunwangs lijunwangs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we want to introduce a new interface to bench-tps. Can you spell out the major difference between tpu-client and this new HighTpsClient? If it is just performance improvement, why cannot we just improve the existing TpuClient?

@gregcusack
Copy link

I am not sure we want to introduce a new interface to bench-tps. Can you spell out the major difference between tpu-client and this new HighTpsClient? If it is just performance improvement, why cannot we just improve the existing TpuClient?

FWIW I also +1 this. I have to modify TpuClient to get it to work properly in LocalCluster and it seems the changes I've made are a small subset of the changes you've added into HighTpsClient.

@KirillLykov
Copy link
Author

Drop this one because there is tpu-client-next with which I manage to achieve 12k+-8k txs per block (using transaction-bench instead).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants