Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

svm: allow conflicting transactions in entries #3453

Merged
merged 6 commits into from
Nov 20, 2024

Conversation

2501babe
Copy link
Member

@2501babe 2501babe commented Nov 4, 2024

Problem

simd83 intends to relax entry-level constraints, namely that a transaction which takes a write lock on an account cannot be batched with any other transaction which takes a read or write lock on it

before the locking code can be changed, however, svm must be able to support such batches. presently, if one transaction were to write to an account, and a subsequent transaction tried to read from or write to that account, svm would not pass the updated account state to the second transaction; it gets them from accounts-db, which is not updated until after all transaction execution finishes

Summary of Changes

in #3404, we refactored transaction processing to perform all steps for each transaction in sequence, and added the AccountLoader type to enable the future addition of an intermediate account cache for changed states. this pr adds that cache

all code changes are in the first commit, and the commit is delightfully small

im still minorly displeased at how much code we have to copy from account saver collect_accounts_to_store because those functions also need the pre-execution transactions, the post-execution transactions, and accumulate multiple vectors of data. maybe we could make the logic in update_accounts_for_executed_tx et al runtime utility functions and have account saver do two passes through everything, but i think this kind of refactor is out of scope here

all other commits are changes to tests. the vast majority of new lines of code in this pr are new integration tests, but they are basically unchanged from #3146. the only important change to the integration tests are making the framework emulate how zero lamport accounts are deallocated. we dont reuse the code on AccountLoader to guard against refactors introducing bugs in its own logic

@2501babe 2501babe force-pushed the 20241103_svmconflicts_attempt6 branch 8 times, most recently from e5f77c3 to 79e6c3c Compare November 9, 2024 20:08
@2501babe 2501babe force-pushed the 20241103_svmconflicts_attempt6 branch 4 times, most recently from f9d86a9 to 1bf6a38 Compare November 14, 2024 00:06
Comment on lines +153 to +160
if let Some(program) = (use_program_cache && is_invisible_read)
.then_some(())
.and_then(|_| self.program_cache.find(account_key))
{
// Optimization to skip loading of accounts which are only used as
// programs in top-level instructions and not passed as instruction accounts.
Some(LoadedTransactionAccount {
return Some(LoadedTransactionAccount {
loaded_size: program.account_size,
account: account_shared_data_from_program(account_key, &self.program_accounts)
.ok()?,
rent_collected: 0,
})
});
}
Copy link
Member Author

@2501babe 2501babe Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i made this a separate block because once disable_account_loader_special_case activates it is simply deleted. we will also be able to remove program_cache and program_accounts from AccountLoader and they become free-floating variables in transaction_processor.rs again

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also an idea i have been toying with after we do that is to simply implement TransactionProcessingCallback for AccountLoader itself so we can pass it to program cache setup, which will happen after account loading. this depends on how feasible it is to restrict callbacks to a concrete type (unless we just make two traits)

Comment on lines 181 to 174
// If lamports is 0, a previous transaction deallocated this account.
// Return None without inspecting, so it can be recreated.
if cache_item.account.lamports() == 0 {
None
} else {
Some(cache_item.account.clone())
}
Copy link
Member Author

@2501babe 2501babe Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while i was getting this pr ready, i realized we dont need to replace zero lamport accounts in the account cache with AccountSharedData::default() at all, so that saves us some Arc::new() calls. we just need something in there that stops us from going to accounts-db, and we can replace it when reading. account_cache is private and load_account() is the only function that reads from it, so we dont need to handle them special anywhere else

Comment on lines 194 to 200
self.account_cache.insert(
*account_key,
AccountCacheItem {
account: account.clone(),
inspected_as_writable: is_writable,
},
);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i left the cache insert unguarded by disable_account_loader_special_case which means it will be used by 2.2 the moment this pr lands. this is safe (@jstarry should confirm):

  • there is no change to pre-simd83 consensus if disable_account_loader_special_case is not active, because the account cache comes after the program cache, so behavior is identical to if these accounts came from accounts-db
  • disable_account_loader_special_case is in 2.0, so it is almost guaranteed to activate before the simd83 consensus gate which will relax account locks. but if, for some startling reason, simd83 consensus activates first, it is still fine. there is an edge case where the program cache would shadow the account cache if you close and reopen a loaderv3 buffer. but this only applies if the account is loaded invisible, so it only affects transaction data size, and this is fine because simd83 consensus necessarily can only be activated by its own feature gate

Comment on lines 120 to 126
if let Some(slot_history) =
account_overrides.and_then(|overrides| overrides.get(&slot_history::id()))
{
// We set `inspected_as_writable` to `true` because this never needs to be inspected.
account_cache.insert(
slot_history::id(),
AccountCacheItem {
account: slot_history.clone(),
inspected_as_writable: true,
},
);
}
Copy link
Member Author

@2501babe 2501babe Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after this pr lands, we can contemplate a more restricted interface for passing only this account through. maybe slot_history: Option<AccountSharedData> on the transaction processing environment

bit of an aside, but the sol_get_sysvar syscall we added in 2.0 for core bpf program conversion does not support SlotHistory by design, we only needed StakeHistory and SlotHashes. so the good news is it doesnt introduce a mechanism to bypass this override

the possibly better news is... if neither me nor joe needed to add SlotHistory, do any important bpf programs or builtins actually use it? i have no idea. maybe we dont need to let programs see this sysvar at all if its only used by the validator itself. if so, we could add some #[deprecated] where needed and specialcase it like the instructions sysvar in load_transaction_account to return an account with no data (would require a feature gate, perhaps we could avoid a simd)

Comment on lines +156 to +157
account_data.set_rent_epoch(u64::MAX);
account_data.set_lamports(1);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test account is read-only, but this is now necessary because svm would put it in the account cache the first time it loads it in an entry and then assume it had been deallocated the second time it loads it. the svm behavior is correct, because there should be no way to bring an account to zero lamports without it being writable, therefore we should never be able to see one in a read-only context in the first place. our builtins and sysvars have dust lamports

@2501babe 2501babe self-assigned this Nov 14, 2024
@2501babe 2501babe marked this pull request as ready for review November 14, 2024 09:37
@2501babe 2501babe requested review from apfitzge and jstarry November 14, 2024 09:57
slot_history::id(),
AccountCacheItem {
account: slot_history.clone(),
inspected_as_writable: true,
Copy link

@apfitzge apfitzge Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought there was a discussion on previous PR about this inspected_as_writable stuff.
The function that is called to inspect accounts already has guards against doing it twice.

Is this still necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we never got confirmation on that. ill check with brooks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright i have clarity on this now, the thing i was missing is that the idea svm has of "what things to inspect" and the idea bank has of "what results of inspection do i actually care about" are disjoint concepts. we can get rid of the state tracking and inspect every time we get an account from accounts-db or account cache. bank has its own dedupe logic and other implementers of inspect can do whatever they want

@2501babe 2501babe force-pushed the 20241103_svmconflicts_attempt6 branch from 1bf6a38 to 96ef843 Compare November 15, 2024 01:38
@2501babe
Copy link
Member Author

2501babe commented Nov 15, 2024

removed the AccountCacheItem wrapper type. also i marked AccountOverrides::set_account() as private, so it isnt possible to construct invalid instances of it anymore. other than the unit test i removed, nothing in our codebase actually used it other than the public method on AccountOverrides for setting slot history specifically

we can get rid of AccountOverrides in a future pr after we settle on a mechanism to accomplish its purpose. i think the reason its Like That is because its author envisioned other uses that never materialized

// Determine a capacity for the internal account cache. This
// over-allocates but avoids ever reallocating, and spares us from
// deduplicating the account keys lists.
let account_keys_in_batch = sanitized_txs.iter().map(|tx| tx.account_keys().len()).sum();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid allocating every time, we can change to use thread local variable instead, wdyt.
example:

thread_local! {
    static HAS_DUPLICATES_SET: RefCell<AHashSet<Pubkey>> = RefCell::new(AHashSet::with_capacity(MAX_TX_ACCOUNT_LOCKS));
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is just getting a reasonable upper bound on account cache size. the mechanism in your pr to prevent duplicate transactions in svm isnt required because svm has no concept of duplicate transactions. my understanding is theyre presently implicitly deduped at the account locks level by relying on the fee-payer lock and we just need to do this explicitly with message hashes instead

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not suggesting here to de-duplicate transactions within a batch. The purpose of this example is to point out a thread_local variable use case to avoid the allocation over-head for account cache in every batch.

Regarding the de-duplication of transactions, when conflicting batches will be allowed the account locking function won't implicitly de-duplicate such transactions. Then as you said we need to explicitly check a batch does not have any duplicate transactions using message hashes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, ty for the suggestion but we dont want to reuse the cache between batches

@Huzaifa696
Copy link

Since unified-scheduler on the replay side has been made default scheduler in 2.1 and unified-scheduler is able to support SIMD#83.
Do we still need to add support for SIMD#83 in the old replay path or it will be deprecated by that time?
@apfitzge @ryoqun

@Huzaifa696
Copy link

Just for clarification, according to the previous conversation the second PR containing the account cache was envisioned to be feature-gated.
But I think, after some major refactoring of program cache there is no need to feature-gate SIMD#83 SVM changes, is that right?
Only the account locking is feature gated?

@2501babe
Copy link
Member Author

Only the account locking is feature gated?

thats correct, the decision to feature gate svm was made because the necessary logic to first go to the account cache and then possibly fall back to the program cache was extremely brittle and had an enormous number of edge cases. but for unrelated reasons we decided to remove (via feature gate) the program cache from account loading entirely and backported it to 2.0, so we were able to simplify the logic here to be much more sound

@Huzaifa696
Copy link

we decided to remove (via feature gate) the program cache from account loading entirely and backported it to 2.0, so we were able to simplify the logic here to be much more sound

Sounds good, so if I understand correctly, the disable_account_loader_special_case will de-couple account cache and program cache within some 2.0.x release and then there is no blocker for the SVM changes of SIMD#83 except account locking.

@2501babe 2501babe merged commit 9116142 into anza-xyz:master Nov 20, 2024
52 checks passed
if let Some(slot_history) =
account_overrides.and_then(|overrides| overrides.get(&slot_history::id()))
{
account_cache.insert(slot_history::id(), slot_history.clone());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a debug assert that the size of account_overrides is equal to 1 if sysvar is present and 0 otherwise? Or is it basically impossible now that we decreased the vis of the account overrides set_account function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it isnt possible, the only exposed setter is set_slot_history now (which accepts AccountSharedData and hardcodes the Pubkey as sysvar::slot_history::id(). we can remove AccountOverrides entirely now as a cleanup. im thinking ill look into it when i start on simd186 since id like to change something about simulation results, and this is related to simulation only

Comment on lines +165 to +166
self.callbacks
.inspect_account(account_key, AccountState::Alive(account), is_writable);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we check if lamports is 0 before saying the account is alive? Otherwise you could read a dead account into the cache and later write to it and even tho it's still dead, we say it's alive.

Copy link
Member Author

@2501babe 2501babe Nov 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

youre right, ill open a pr for this on monday. thankfully it doesnt affect bank because to get here we would have already had to inspect as writable (which is why this was missed; before removing AccountCacheItem, inspecting a previously-alive now-dead account never happened at all)

edit: actually there is a slight edge case of taking a dead account read-only, leaving it dead, and making it alive, but this is only possible post-simd83 and the fix is the same

edit2: nevermind there actually is no way to get a bad first inspect because to bring the account to zero lamports in the first place it must necessarily be inspected as writable

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm isn't there a bug post simd-83 where you read a dead account in one tx and then write to it in another? The inspect for the write tx would say it's alive when it's actually dead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i made my first edit thinking of that, but made the second edit thinking of the reason why we didnt need to check for equality with AccountSharedData::default(). but in this case the account doesnt need to exist

(i have a pr for this now but have been blocked by ci build issues, will add you once its green)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants