Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add analysis for bench-tps transactions #92

Merged
merged 25 commits into from
Mar 26, 2024

Conversation

KirillLykov
Copy link

@KirillLykov KirillLykov commented Mar 5, 2024

Problem

There is no analysis of what happens with transactions submitted by bench-tps.
It doesn't submit metrics because it looks like we need first rework metrics in bench-tps and later add new if we need them. Hence, I decided to save everything in csv files first which has value by itself.

Summary of Changes

This PR adds optional analysis of the transactions submitted by bench-tps.
To do that, it creates a thread that requests all the blocks created during bench-tps run and filters the transactions.
This threads may optionally create two csv files: one for all the transactions and the other only for the blocks data.
The first is useful for debugging problems but too big to be generated for each run, while the second is generally good to have to understand how many transactions has been confirmed and some other stats.
Similar analysis was previously implemented in mango market making simulation client.

@KirillLykov
Copy link
Author

Updated PR to address the comments of @ilya-bobyr

client: &Arc<Client>,
block_data_file: Option<&str>,
transaction_data_file: Option<&str>,
) -> (Option<LogTransactionService>, Option<SignatureBatchSender>)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I return pair of options instead of option of pair because I will need to pass these values independently to different functions.

@KirillLykov KirillLykov force-pushed the klykov/bench-tps-txs-analysis branch 2 times, most recently from 65ebb59 to 0787a57 Compare March 8, 2024 13:48
@KirillLykov KirillLykov marked this pull request as ready for review March 8, 2024 13:48
@codecov-commenter
Copy link

codecov-commenter commented Mar 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.9%. Comparing base (6024712) to head (54adc0c).

Additional details and impacted files
@@           Coverage Diff           @@
##           master      #92   +/-   ##
=======================================
  Coverage    81.9%    81.9%           
=======================================
  Files         840      840           
  Lines      228068   228068           
=======================================
+ Hits       186828   186834    +6     
+ Misses      41240    41234    -6     

let mut min_timestamp = u64::MAX;
let mut transactions = Vec::<_>::with_capacity(txs0.len());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use the local tx_len variable for these instead

}

// How often process blocks.
const PROCESS_BLOCKS_EVERY_MS: u64 = 16 * DEFAULT_MS_PER_SLOT;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 16? Would be nice to drop a comment for why we chose this


// How often process blocks.
const PROCESS_BLOCKS_EVERY_MS: u64 = 16 * DEFAULT_MS_PER_SLOT;
// Max age for transaction in the transaction map.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

age here means time we will wait for a tx before giving up on it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, probably better to be more explicit in the comment

const NUM_RETRY: u64 = 5;
const RETRY_EVERY_MS: u64 = 4 * DEFAULT_MS_PER_SLOT;

fn call_rpc_with_retry<Func, Data>(f: Func, retry_warning: &str) -> Result<Data>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a little funky to have buried in log_transaction_service. Is there anywhere else this function might make sense to live?

}
}

fn verify_data_files(block_data_file: Option<&str>, transaction_data_file: Option<&str>) -> bool {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verify here is a little misleading. Maybe data_file_provided ?

Copy link

@bw-solana bw-solana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me - left a few minor comments.

Do you have a sense for how large a block/transaction file will be generated for a given run time (assuming a standard transaction load)?

@KirillLykov
Copy link
Author

@bw-solana transactions file might be as big as 100MB for 10min run of bench-tps, so I want to use it only for debug purposes. The block file is quite small, like 1.4MB for one hour of simulation.

@bw-solana
Copy link

@bw-solana transactions file might be as big as 100MB for 10min run of bench-tps, so I want to use it only for debug purposes. The block file is quite small, like 1.4MB for one hour of simulation.

Nice, that's actually smaller than I was expecting

Copy link

@gregcusack gregcusack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good to me. just some small nits

fn new(transaction_data_file: Option<&str>) -> Self {
let transaction_log_writer = transaction_data_file.map(|transaction_data_file| {
CsvFileWriter::from_writer(
File::create(transaction_data_file).expect("File can be created."),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: File cannot be created.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found in docs We recommend that expect messages are used to describe the reason you expect the Result should be Ok.. Following this logic, maybe something like "application should be able to create a file"?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh this is good to know! and ya that sounds good to me!

impl BlockLogWriter {
fn new(block_data_file: Option<&str>) -> Self {
let block_log_writer = block_data_file.map(|block_data_file| {
CsvFileWriter::from_writer(File::create(block_data_file).expect("File can be created."))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: File cannot be created.

Comment on lines +118 to +132
let commitment: CommitmentConfig = CommitmentConfig {
commitment: CommitmentLevel::Confirmed,
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth pulling this in from the cli using --commitment-config?

Copy link
Author

@KirillLykov KirillLykov Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is used to get blocks data and from the discussion in the previous PR it seems that it should work only with confirmed. Probably, makes sense to add corresponding comment there.

.spawn(move || {
Self::run(client, signature_receiver, tx_log_writer, block_log_writer);
})
.expect("LogTransactionService is up.");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok general comment: doesn't the .expect(<message>) trigger when the unwrap on the object fails? So shouldn't the <message> reflect why/what failed rather than the success path? or am i confused?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like I misunderstood the expect phrasing, I will redo it everywhere as described in #92 (comment)

Comment on lines 296 to 298
let is_not_timeout_tx =
duration_since_past_time.num_seconds() < REMOVE_TIMEOUT_TX_EVERY_SEC;
if !is_not_timeout_tx {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd maybe change the logic. having !is_not_timeout_tx {...} is a double negative and can be weird to read. I'd change to using something like:

let is_timeout_tx =
                duration_since_past_time.num_seconds() >= REMOVE_TIMEOUT_TX_EVERY_SEC;

and then return !is_timeout_tx.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

@KirillLykov
Copy link
Author

KirillLykov commented Mar 14, 2024

There is a bug in the loop with select: the processing of blocks takes longer time than specified so when sender has been dropped, we will always go to receiver for timer processing blocks which are irrelevant and never stopping the service. The solution could be:

  • create two threads: one for transactions handling and the other for blocks. This way we can wait all the incoming signatures has been handled and block stats has been updated. This would require to make communication between these two thread more complicated (instead of HashMap use DashMap).
  • implement this part with tokio. Probably, performance-wise would be the best but I don't want to introduce tokio as part of this already complicated PR and this also will lead to more sophisticated communication between threads.
  • check the size of the receiver, if not 0, skip writing blocks information for now. Probably in combination with limited number of blocks to download, might work.

@CriesofCarrots CriesofCarrots removed their request for review March 15, 2024 01:25
@CriesofCarrots
Copy link

I'll defer to the other reviewers you have on this one.

@gregcusack
Copy link

gregcusack commented Mar 15, 2024

There is a bug in the loop with select: the processing of blocks takes longer time than specified so when sender has been dropped, we will always go to receiver for timer processing blocks which are irrelevant and never stopping the service.

So, when bench-tps stops sending transactions, the receiver continues to process blocks and write them to the CSV, but we don't actually care about these "empty" blocks, right? And we need a way to basically stop the receiver once the sender stops. Is that right?

@gregcusack
Copy link

  • create two threads: one for transactions handling and the other for blocks. This way we can wait all the incoming signatures has been handled and block stats has been updated. This would require to make communication between these two thread more complicated (instead of HashMap use DashMap).
  • implement this part with tokio. Probably, performance-wise would be the best but I don't want to introduce tokio as part of this already complicated PR and this also will lead to more sophisticated communication between threads.
  • check the size of the receiver, if not 0, skip writing blocks information for now. Probably in combination with limited number of blocks to download, might work.

Simpler is better. so would vote for (3) if you can get it to work. Seems doable.

@KirillLykov
Copy link
Author

  • create two threads: one for transactions handling and the other for blocks. This way we can wait all the incoming signatures has been handled and block stats has been updated. This would require to make communication between these two thread more complicated (instead of HashMap use DashMap).
  • implement this part with tokio. Probably, performance-wise would be the best but I don't want to introduce tokio as part of this already complicated PR and this also will lead to more sophisticated communication between threads.
  • check the size of the receiver, if not 0, skip writing blocks information for now. Probably in combination with limited number of blocks to download, might work.

Simpler is better. so would vote for (3) if you can get it to work. Seems doable.

i've first implemented 1 but the code is cumbersome. I'm trying with (3), I use it vs invalidator and tesnet, will see how good it is.

@KirillLykov KirillLykov force-pushed the klykov/bench-tps-txs-analysis branch from 0787a57 to 5742a4c Compare March 22, 2024 17:16
@KirillLykov KirillLykov force-pushed the klykov/bench-tps-txs-analysis branch from 29a466f to 4374f5a Compare March 24, 2024 13:54
@KirillLykov
Copy link
Author

@gregcusack please let me know if this change 0f93cd3 seems to be clear

@KirillLykov KirillLykov force-pushed the klykov/bench-tps-txs-analysis branch from 0f93cd3 to 7cd0d67 Compare March 25, 2024 13:58
@gregcusack
Copy link

@gregcusack please let me know if this change 0f93cd3 seems to be clear

much more clear! i appreciate the change

Copy link

@gregcusack gregcusack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! i like how you dealt with endless block processing bug. seems simple yet effective. appreciate the clarity on is_timeout_tx

@KirillLykov KirillLykov merged commit 1261f1f into anza-xyz:master Mar 26, 2024
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants