-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support tx poh recording in unified scheduler #4150
base: master
Are you sure you want to change the base?
Conversation
2add55e
to
427736f
Compare
@@ -140,7 +140,7 @@ pub struct RecordTransactionsSummary { | |||
pub starting_transaction_index: Option<usize>, | |||
} | |||
|
|||
#[derive(Clone)] | |||
#[derive(Clone, Debug)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need Debug
on all of this? I can't imagine anyone has ever used it for the massive scheduling and bank structures. I'd sooner remove it from all of that than let it spread further, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, i think Debug
is handy for quick debugging and actually required by assert_matches!()
...
sdk/transaction-error/src/lib.rs
Outdated
/// Commit failed internally. | ||
CommitFailed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we now able to fail on commit while it was not previously possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we are failing the pre-commit check (recording), not commit itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the case, why can we not eat the error in the same way we do in normal block-production, without adding a new variant here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we now able to fail on commit while it was not previously possible?
...
It seems like we are failing the pre-commit check (recording), not commit itself.
hope this rename helps to reduce this confusion: 23159ff
why can we not eat the error in the same way we do in normal block-production, without adding a new variant here?
normal block-production code path can handle the error condition while not confined to TransactionError
. it is directly deciding what to commit by looking RecordTransactionsSummary at core/src/banking_stage/consumer.rs.
However, I opted not to use the block production code path for SchedulingMode::BlockProduction
because it has some unwanted functionalities for unified scheduler. Also, I didn't want to introduce yet another code-path with copied code for this transaction execution because these code-pathes are quite important to be maintained well.
While it's not ideal to add a variant here, there's some precedent for internal-use variants: ResanitizationNeeded
and ProgramCacheHitMaxLimit
. And they are added for ease of introduction rather than correctly adjusting all the types around code base.
unified-scheduler-pool/src/lib.rs
Outdated
BlockProduction => { | ||
let mut vec = vec![]; | ||
if handler_context.transaction_status_sender.is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this part, could you explain what the thought process here is?
What does block-production have to do with transaction_status_sender
; in nearly all cases it should be None
for block-producers since it is only used for RPCs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given below code, I'm thinking it may have been a typo and expected to be transaction_recorder
?
Even so, it seems better to guarantee the recorder is present if we have block-production mode selected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay; I see now it's RPC-only stuff for the status batches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for pointing out unclear code...: a59e39a
ledger/src/blockstore_processor.rs
Outdated
) -> Result<()> { | ||
let TransactionBatchWithIndexes { | ||
batch, | ||
transaction_indexes, | ||
} = batch; | ||
let record_token_balances = transaction_status_sender.is_some(); | ||
let mut transaction_indexes = transaction_indexes.to_vec(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: adding a clone here here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we can remove this clone and simplify the callback interface.
What if the callback itself did an allocation (only if necessary) returning the transaction index in that case.
I also really don't like the Option<Option<usize>>
that is there now because it hides the meaning. It's not clear from just this code that the outer option means that recording/pre-commit failed.
If I were to do this, I'd take one of these two approaches:
- Make the outer option of return value a Result<...,()>
- Simple enum type to more clearly represent meanings
Lean towards 1 since it's simpler, and not sure an additional enum would benefit too much here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: adding a clone here here
Wondering if we can remove this clone and simplify the callback interface.
What if the callback itself did an allocation (only if necessary) returning the transaction index in that case.
nice catch.. done: 08536f0
I think Cow is enough. I'd like to remain the closure agnostic from allocation at all for separation of concern.
I also really don't like the
Option<Option<usize>>
that is there now because it hides the meaning. It's not clear from just this code that the outer option means that recording/pre-commit failed.If I were to do this, I'd take one of these two approaches:
1. Make the outer option of return value a Result<...,()> 2. Simple enum type to more clearly represent meanings
Lean towards 1 since it's simpler, and not sure an additional enum would benefit too much here.
this is done: 3b852f6
runtime/src/bank.rs
Outdated
recording_config: ExecutionRecordingConfig, | ||
timings: &mut ExecuteTimings, | ||
log_messages_bytes_limit: Option<usize>, | ||
pre_commit_callback: Option<impl FnOnce() -> bool>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it simpler to just not make this optional?
just have impl FnOnce() -> bool
. Likely compiler will optimize the simple case of || true
out completely, and let's us simplify the code by not having conditional calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hehe, i made them Option
-ed by purpose. wrote about it a bit: 36f8537
indeed, || true
will completely optimized out. but I prefer to retain Option
here, considering this is very security sensitive code for extra safety to ensure block-verification code-path isn't affected.
runtime/src/bank.rs
Outdated
timings: &mut ExecuteTimings, | ||
log_messages_bytes_limit: Option<usize>, | ||
pre_commit_callback: Option<impl FnOnce() -> bool>, | ||
) -> Option<(Vec<TransactionCommitResult>, TransactionBalancesSet)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also seems like it should be a Result
instead of an Option
, because this should succeed and fails due to something going wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is done: 3b852f6
bb442fa
to
77136d3
Compare
b4a71e2
to
f715f8c
Compare
f715f8c
to
cd60f32
Compare
75faba8
to
f290620
Compare
f290620
to
447bdb6
Compare
Problem
Currently, unified scheduler cant record transactions into poh recorder because its subsystem called
TaskHander
and its underlying code-path (solana-ledger
=>solana-runtime
) doesn't support such a committing operation.Summary of Changes
Support it with a relatively less intrusive way by introducing a callback mechanism along the code-path.
extracted from #3946, see the pr for the general overview.