Enhancements for tpu-client-next #2991

KirillLykov · 2024-09-26T11:11:44Z

Problem

Tpu-client-next is introduced in #2905
This issue is to track features requests.

Proposed by me:

Add fanout to send to next 2 leaders add fanout to tpu-client-next #3478
Add option to send with several identities to increase used bandwidth.

Proposed by @godmodegalactus:

Order transactions to send according to their priority fees. Currently, priority fee is not taken into consideration. This might be an interesting feature for users.
Use timestamp per transaction, not per batch. Because typically transactions are not generated in batches at one moment in time as you do in the bmk code.
Change mechanism of age check for transactions

Proposed by Alesssandro:

ConnectionWorker:run should return Result.

with the API as it is now, it's impossible to tell whether ConnectionWorker::run()
succeeded or failed (other than looking at stats, which is not a good API).
I think that run() should probably return Result, and whenever there's a fatal
un-retriable error it should return the actual error.

Improve connect to leaders

The way you connect to the leaders can be
improved. I think as part of setup you should connect to all the leaders, then
this case !workers.contains(current_leader) or
worker_for_leader.is_disconnected() should print an error!() - it should never
happen outside of initial setup, and if it happens something is seriously borked
and is going to impact perf, so should be visible in logs

Shutdown in background task:

I think you should shutdown in a background task, so as not to impact the send
path at all. But these stats currently preclude you from doing so, since you
need to modify self.send_stats_per_addr. Maybe these stats should just be held
in the background task, and you send them over to the scheduler or something.
#3478

Are Quic client parameters optimal?

Having max idle only 2 * KEEP_ALIVE strikes me as wrong in case there's packet
loss. This is something we should measure.

Before shutting down the workers, save the key material to use later for 0rtt. Or could save it when you spawn after handshake.

The text was updated successfully, but these errors were encountered:

KirillLykov · 2024-11-05T11:24:10Z

Stream::write_all is not cancel-safe.
I use it here:
https://github.com/anza-xyz/agave/blob/master/tpu-client-next/src/quic_networking.rs#L66
which is used here https://github.com/anza-xyz/agave/blob/master/tpu-client-next/src/connection_worker.rs#L182
And can be canceled here -- https://github.com/anza-xyz/agave/blob/master/tpu-client-next/src/connection_worker.rs#L154

KirillLykov · 2024-12-04T17:09:38Z

Discussed with Alex how could be organized sending transactions inside the tpu-client-next:

use channel of size 1 or don't use any,
inside try_send to the workers
there might be a task that checks the status of worker channels for the current slot and in advance slow down sending to the ingress channel using this information
alternatively we can use "random early detection"

KirillLykov · 2024-12-11T14:39:20Z

Extend `tpu-client-next` to support different sending strategies

tpu-client-next is currently used in the following places:

Send transactions received by RPC to the set of nodes. Part of validator code.
Forward transactions to next leaders. Part of validator code.
Send transactions generated by transaction-bench client to stress test validator.

These 3 uses cases create different requirements for the client code.

1. Send in RPC

Transactions received by RPC should be send to a set of nodes. This set of nodes is composed of current node, next leaders, predefined set of custom nodes and, hence, might be contain many nodes. There is a retry logic so if a transaction is not in the block, it will be send again.

Having this in mind, optimal to minimize latency of transactions by reducing the time tx spends while going through the call stack while keeping the throughput to be aligned with max achievable with current RPC/STS implementation (30% lower than for TPU).

Hence, from client prospective the optimal strategy is to try_send to each of the connection worker channel. The expected behavior that at least some nodes will receive the transaction. But even if all the channels are full, we can drop the transaction because it will be retried later. This is how it is currently implemented.

2. Send in Forwarder

Some transactions (non-vote, not forwarded, staked, etc) received by the current node are forwarded to the next leader. The difference with "Send in RPC" is, hence, sending to only one node and lack of retry logic.

A good option could be send_timeout because we want to deliver as many transactions as possible while the node is a leader. Having send will give us back pressure on the side of the user of the Scheduler.

3. Send transactions for stress testing

In the transaction-bench client we generate transactions and send them the Scheduler which later sends to workers.
Here, the goal is to maximize the throughput, while having meaningful latency (so that txs do not expire).

There might be more than one estimated leader, so I don't want to send with the speed of slowest connection but instead would like to send to at least one from the set.

It is nice to have a backpressure to slow down transaction generation in case of network cannot send that many transactions. The current behavior is that we will generate more than we are able to send, so most of the transactions will be dropped.

How this could be implemented

These possible strategies are non-overlapping for one client. So there should be a way to configure the client to use only one.

Where sending happens: https://github.com/anza-xyz/agave/blob/master/tpu-client-next/src/connection_workers_scheduler.rs#L169

Schematically the logic is the following:

loop {
    let transaction_batch = transaction_receiver.recv();
    update_connections();
    for leader_worker in future_leaders_workers {
         leader_worker.sender.send(transaction_batch);
    }
}

So this part where we send to particular leader might be abstracted using some structure which is selected in the config.

KirillLykov · 2024-12-18T16:27:56Z

How to minimize allocations

The problem is that we need to distribute transaction batch over a set of leaders.

For small packets, it makes sense to copy packets to pre-allocated buffers (which allocator should use internally).
But for larger ones, it makes sense to use use instead reference to TransactionBatch object.
For example, in implementation of ZeroMQ, these two strategies are implemented.

In our domain, for RPC/STS I see that we almost always send transactions in very small batches. While for transaction-bench we use larger batches by design. So the solution could be to use generic type for TransactionBatch. But first I would benchmark these two options for these two use case, if there is no difference I would lean to use Arc everywhere for the sake of simpler implementation.

KirillLykov self-assigned this Sep 26, 2024

KirillLykov changed the title ~~Features to add to tpu-client-next~~ Enhancements for tpu-client-next Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancements for tpu-client-next #2991

Enhancements for tpu-client-next #2991

KirillLykov commented Sep 26, 2024 •

edited

Loading

KirillLykov commented Nov 5, 2024

KirillLykov commented Dec 4, 2024

KirillLykov commented Dec 11, 2024 •

edited

Loading

KirillLykov commented Dec 18, 2024

Enhancements for tpu-client-next #2991

Enhancements for tpu-client-next #2991

Comments

KirillLykov commented Sep 26, 2024 • edited Loading

Problem

KirillLykov commented Nov 5, 2024

KirillLykov commented Dec 4, 2024

KirillLykov commented Dec 11, 2024 • edited Loading

Extend tpu-client-next to support different sending strategies

1. Send in RPC

2. Send in Forwarder

3. Send transactions for stress testing

How this could be implemented

KirillLykov commented Dec 18, 2024

How to minimize allocations

KirillLykov commented Sep 26, 2024 •

edited

Loading

KirillLykov commented Dec 11, 2024 •

edited

Loading

Extend `tpu-client-next` to support different sending strategies