-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a more performant client to bench-tps #544
Add a more performant client to bench-tps #544
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #544 +/- ##
=========================================
- Coverage 81.8% 81.8% -0.1%
=========================================
Files 851 851
Lines 230165 230172 +7
=========================================
- Hits 188464 188460 -4
- Misses 41701 41712 +11 |
b62e9fa
to
a86fb92
Compare
a86fb92
to
4b45bb8
Compare
@@ -27,6 +27,8 @@ pub enum ExternalClientType { | |||
// Submits transactions directly to leaders using a TpuClient, broadcasting to upcoming leaders | |||
// via TpuClient default configuration | |||
TpuClient, | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to add comment here to detail where we send transactions (direct to leaders) and short summary of difference w/ respect to TpuClient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM
i'm not sure if it is better to:
|
@ilya-bobyr suggestions about the name of the class? I don't really like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good for the most part! the numbers speak for themselves! just a couple questions/nits
.arg( | ||
Arg::with_name("high_tps_client") | ||
.long("use-high-tps-client") | ||
.conflicts_with("rpc_client") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a .conflicts_with("tpu_client")
here?
rpc_client, | ||
websocket_url, | ||
HighTpsClientConfig { | ||
fanout_slots: 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
were your experiments run with fanout_slots: 1
and any reason you landed on 1? Does it make sense to make configurable via CLI? The code in high_tps_client.rs
makes it seem like it should be configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrote in the other comment my thoughts about that. Regarding configuration in general, I would try to decrease the number of configurable parameters to simplify the ux of the tool. If we know some optimal parameters, I would prefer to hardcode them to decrease the mental to understand how to configure them. Currently, we have: number of threads, tx count, size of CC, sustained/not, timeout, ... I think we should just find one optimal set of parameters because it most of these are never touched. And if they are changed, it might mean that the defaults are not the most performant.
use { | ||
crate::bench_tps_client::{BenchTpsClient, BenchTpsError, Result}, | ||
solana_client::{ | ||
connection_cache::Protocol, nonblocking::tpu_client::LeaderTpuService, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you import this directly from solana_tpu_client::nonblocking::tpu_client::LeaderTpuService
? sounds like we're eventually going to get rid of the solana_client
according to Tyera
|
||
fn send_batch(&self, transactions: Vec<Transaction>) -> Result<()> { | ||
let wire_transactions = transactions | ||
.into_iter() //.into_par_iter() any effect of this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in response to comment here: probably if Vec<Transaction>
is long. tpu_client
uses .into_par_iter()
so may make sense to use into_par_iter()
but i haven't tested it to see performance diff
.map(|tx| bincode::serialize(&tx).expect("transaction should be valid.")) | ||
.collect::<Vec<_>>(); | ||
for c in wire_transactions.chunks(self.config.send_batch_size) { | ||
let tpu_addresses = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possible rename tpu_addresses
to leader_tpu_addresses
??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually context may clear enough.
let tpu_addresses = self | ||
.leader_tpu_service | ||
.leader_tpu_sockets(self.config.fanout_slots); | ||
for tpu_address in &tpu_addresses { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only sending to one leader here right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be to fanout_slots
leaders.
In configuration that I specify 1
for this parameter because I think that we don't care how many txs have been timed out but only interested in maximazing TPS. Due to that it makes sense to utilize NIC bandwidth to send as many datagrams to one leader as possible (although it might be that for particular cluster configuration we cannot fully utilize it due to staked congestion control)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we want to introduce a new interface to bench-tps. Can you spell out the major difference between tpu-client and this new HighTpsClient? If it is just performance improvement, why cannot we just improve the existing TpuClient?
FWIW I also +1 this. I have to modify |
Drop this one because there is tpu-client-next with which I manage to achieve 12k+-8k txs per block (using transaction-bench instead). |
Problem
Motivation is described in #415
Don't like the name
HighTpsClient
, but for now use it as working name.It is also possible to modify
TpuClient
to be more performant by usingruntime::spawn
instead ofruntime::invoke
(or equivalent that we use there which istokio::task::block_in_place(move || self.rpc_client.runtime().block_on(f))
).But in this case we cannot handle errors. Another way, might be to implement another method
TpuClient::send_batch_xxx
which will use spawn internally. see discussionSummary of Changes
Add a new client which is supposed to be more performant than TpuClient for bench-tps needs.
Test runs results
The identity of the node we use has 10% of total stake.
Would like to check if we can achieve better result with several identities to be close to UDP case.
TpuClient:
HighTpsClient:
#415 (comment)
Experiments setup