Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancements for tpu-client-next #2991

Open
2 of 10 tasks
KirillLykov opened this issue Sep 26, 2024 · 4 comments
Open
2 of 10 tasks

Enhancements for tpu-client-next #2991

KirillLykov opened this issue Sep 26, 2024 · 4 comments
Assignees

Comments

@KirillLykov
Copy link

KirillLykov commented Sep 26, 2024

Problem

Tpu-client-next is introduced in #2905
This issue is to track features requests.

Proposed by me:

Proposed by @godmodegalactus:

  • Order transactions to send according to their priority fees. Currently, priority fee is not taken into consideration. This might be an interesting feature for users.
  • Use timestamp per transaction, not per batch. Because typically transactions are not generated in batches at one moment in time as you do in the bmk code.
  • Change mechanism of age check for transactions

Proposed by Alesssandro:

  • ConnectionWorker:run should return Result.

with the API as it is now, it's impossible to tell whether ConnectionWorker::run()
succeeded or failed (other than looking at stats, which is not a good API).
I think that run() should probably return Result, and whenever there's a fatal
un-retriable error it should return the actual error.

The way you connect to the leaders can be
improved. I think as part of setup you should connect to all the leaders, then
this case !workers.contains(current_leader) or
worker_for_leader.is_disconnected() should print an error!() - it should never
happen outside of initial setup, and if it happens something is seriously borked
and is going to impact perf, so should be visible in logs

I think you should shutdown in a background task, so as not to impact the send
path at all. But these stats currently preclude you from doing so, since you
need to modify self.send_stats_per_addr. Maybe these stats should just be held
in the background task, and you send them over to the scheduler or something.
#3478

Having max idle only 2 * KEEP_ALIVE strikes me as wrong in case there's packet
loss. This is something we should measure.

  • Before shutting down the workers, save the key material to use later for 0rtt. Or could save it when you spawn after handshake.
@KirillLykov KirillLykov self-assigned this Sep 26, 2024
@KirillLykov KirillLykov changed the title Features to add to tpu-client-next Enhancements for tpu-client-next Oct 14, 2024
@KirillLykov
Copy link
Author

Discussed with Alex how could be organized sending transactions inside the tpu-client-next:

  • use channel of size 1 or don't use any,
  • inside try_send to the workers
  • there might be a task that checks the status of worker channels for the current slot and in advance slow down sending to the ingress channel using this information
  • alternatively we can use "random early detection"

@KirillLykov
Copy link
Author

KirillLykov commented Dec 11, 2024

Extend tpu-client-next to support different sending strategies

tpu-client-next is currently used in the following places:

  1. Send transactions received by RPC to the set of nodes. Part of validator code.
  2. Forward transactions to next leaders. Part of validator code.
  3. Send transactions generated by transaction-bench client to stress test validator.

These 3 uses cases create different requirements for the client code.

1. Send in RPC

Transactions received by RPC should be send to a set of nodes. This set of nodes is composed of current node, next leaders, predefined set of custom nodes and, hence, might be contain many nodes. There is a retry logic so if a transaction is not in the block, it will be send again.

Having this in mind, optimal to minimize latency of transactions by reducing the time tx spends while going through the call stack while keeping the throughput to be aligned with max achievable with current RPC/STS implementation (30% lower than for TPU).

Hence, from client prospective the optimal strategy is to try_send to each of the connection worker channel. The expected behavior that at least some nodes will receive the transaction. But even if all the channels are full, we can drop the transaction because it will be retried later. This is how it is currently implemented.

2. Send in Forwarder

Some transactions (non-vote, not forwarded, staked, etc) received by the current node are forwarded to the next leader. The difference with "Send in RPC" is, hence, sending to only one node and lack of retry logic.

A good option could be send_timeout because we want to deliver as many transactions as possible while the node is a leader. Having send will give us back pressure on the side of the user of the Scheduler.

3. Send transactions for stress testing

In the transaction-bench client we generate transactions and send them the Scheduler which later sends to workers.
Here, the goal is to maximize the throughput, while having meaningful latency (so that txs do not expire).

There might be more than one estimated leader, so I don't want to send with the speed of slowest connection but instead would like to send to at least one from the set.

It is nice to have a backpressure to slow down transaction generation in case of network cannot send that many transactions. The current behavior is that we will generate more than we are able to send, so most of the transactions will be dropped.

How this could be implemented

These possible strategies are non-overlapping for one client. So there should be a way to configure the client to use only one.

Where sending happens: https://github.com/anza-xyz/agave/blob/master/tpu-client-next/src/connection_workers_scheduler.rs#L169

Schematically the logic is the following:

loop {
    let transaction_batch = transaction_receiver.recv();
    update_connections();
    for leader_worker in future_leaders_workers {
         leader_worker.sender.send(transaction_batch);
    }
}

So this part where we send to particular leader might be abstracted using some structure which is selected in the config.

@KirillLykov
Copy link
Author

How to minimize allocations

The problem is that we need to distribute transaction batch over a set of leaders.

For small packets, it makes sense to copy packets to pre-allocated buffers (which allocator should use internally).
But for larger ones, it makes sense to use use instead reference to TransactionBatch object.
For example, in implementation of ZeroMQ, these two strategies are implemented.

In our domain, for RPC/STS I see that we almost always send transactions in very small batches. While for transaction-bench we use larger batches by design. So the solution could be to use generic type for TransactionBatch. But first I would benchmark these two options for these two use case, if there is no difference I would lean to use Arc everywhere for the sake of simpler implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant