Don't load executable accounts for transactions #180

seanyoung · 2024-03-11T16:00:37Z

Problem

There is no need for loading executable accounts for transactions, if the account is already present in the program cache.

Summary of Changes

The program cache needs to track the size, rent epoch and lamports for the programs.
If an account is not loaded, then we pad the account size with 0s. This is because otherwise transactions use less compute units, and then transactions which fail on mainnet may suddenly pass in our validator. This makes checking for divergence possible.
There are a few programs running on mainnet which read the contents of the executable accounts. There is a an exception list for these accounts. Note that all executable accounts in that instruction needs to be loaded, since we do not know ahead of time which account will be read.
All programs currently in this list are upgradable
This code has been running for months in a mainnet validator without divergence. However, the code has been re-written a number of times - I completely changed the approach a number of times. The current version of the code has been running for about 3 weeks now.
I intend to let the validator with these code changes run continuously until the feature is activated

Future:

Remove the need for zero-padding the non-loaded executable accounts
Remove the need to pass executable accounts down the CPI chain Don't require program_id to passed down cpi call chain solana-labs/solana#34191
Fixes #

codecov-commenter · 2024-03-14T20:19:54Z

Codecov Report

Attention: Patch coverage is 93.81107% with 19 lines in your changes are missing coverage. Please review.

Project coverage is 81.9%. Comparing base (cc3afa5) to head (2016e3a).

Additional details and impacted files

@@           Coverage Diff            @@
##           master     #180    +/-   ##
========================================
  Coverage    81.9%    81.9%            
========================================
  Files         837      837            
  Lines      226823   227064   +241     
========================================
+ Hits       185828   186091   +263     
+ Misses      40995    40973    -22

Lichtso · 2024-03-20T22:01:57Z

Can you rebase it onto master and resolve the conflicts?

seanyoung · 2024-03-21T12:54:48Z

Can you rebase it onto master and resolve the conflicts?

It's been rebased

alessandrod · 2024-03-26T08:02:06Z

I only skimmed through the changes haven't done a proper review, but here's some initial comments since this ties to another perf issue I've been investigating.

There are a few programs running on mainnet which read the contents of the executable accounts. There is a an exception list for these accounts. Note that all executable accounts in that instruction needs to be loaded, since we do not know ahead of time which account will be read.

I don't think we can do this. It is acceptable to break those programs, but I don't think it's acceptable to have an exception list, because it's exploitable: there could be a dormant malicious program, not in the exception list, waiting for this feature to be enabled to trigger a consensus issue by accessing an executable program. EDIT: the PR adds a new feature gate, so this is fine.

Anyway, what brought me here is that I've been investigating account_matches_owners interacting with the accounts-db read cache, and causing a lot of major page faults (hitting disk).

I think the original approach of having separate filter_executable_program_accounts/load_accounts is not ideal. We introduced account_matches_owners() to "peek" into the accounts-db before we load the non-executable accounts, which effectively forces us to do two round trips into the accountsdb. We first make accounts-db iterate over all accounts in a message, then do another round trip where we make it iterate over all the non-executable accounts.

account_matches_owners() interacts with the internal caches in the accountsdb, pages in parts of the accounts index, pages in parts of appendvecs, etc, so this is not great perf wise.

Instead, I think we could have a load_accounts_with_filter(account_keys, |...| { ... }) and in the filter build the list of program accounts, and return a value to signal to the accounts-db to not load nor insert an account into its caches. What do you think?

alessandrod · 2024-03-26T23:08:35Z

With an upcoming fix for the accounts-db to avoid thrashing the readonly cache, account_matches_owners becomes the largest offender in tx execution, causing a lot of pagecache faults (and sleeps):

t-nelson

which SIMDs do these feature gates implement?

t-nelson · 2024-03-27T01:24:02Z

sdk/src/feature_set.rs

@@ -785,6 +785,14 @@ pub mod deprecate_unused_legacy_vote_plumbing {
    solana_sdk::declare_id!("6Uf8S75PVh91MYgPQSHnjRAPQq6an5BDv9vomrCwDqLe");
 }

+pub mod dont_load_executable_accounts {


please don't introduce two feature gates in the same pr. this is indicative of a change set that should be split up (as is the line change total)

Lichtso · 2024-03-27T10:11:44Z

account_matches_owners becomes the largest offender in tx execution, causing a lot of pagecache faults (and sleeps)

That is because it is the first place to touch all the accounts of a TX.
If we were to remove the account_matches_owners() from filter_executable_program_accounts() then load_accounts() would in turn become the place that causes all of these, right?

But I agree that filter_executable_program_accounts(), replenish_program_cache() and load_accounts() should be refactored. IMO it is sufficient to only check the program cache in filter_executable_program_accounts() and swap the order of load_accounts() and replenish_program_cache(). I will try this out and PR it.

alessandrod · 2024-03-27T11:14:44Z

If we were to remove the account_matches_owners() from filter_executable_program_accounts() then load_accounts() would in turn become the place that causes all of these, right?

Nope because account_matches_owners explicitly skips parts of the accounts-db cache (but not all)

But I agree that filter_executable_program_accounts(), replenish_program_cache() and load_accounts() should be refactored. IMO it is sufficient to only check the program cache in filter_executable_program_accounts() and swap the order of load_accounts() and replenish_program_cache(). I will try this out and PR it.

It isn't because you still don't want executable accounts to occupy space in the cache if they're going to be cached elsewhere (the program cache). Since you never want load_accounts(executable) to cache, accounts-db needs to be directly involved, hence the load_accounts_with_filter() suggestion.

Lichtso · 2024-03-27T12:21:44Z

I think we have roughly the same thing in mind. What I suggest is three phases:

Call ProgramCache::extract() once with a list of all message accounts. The result is a list of keys which are not in the global program cache, either because they are not loaded or because they are not programs / owned by a loader.
Pass that list into load_accounts() and load_transaction_accounts() where we then check the owner to eliminate all entries from the list which are not owned by a loader (e.g. by account_matches_owners()). This depends on account_matches_owners() being fixed to not load the entire account data. Entries which were found in the global program cache are not touched here.
At that point we are left with a list that needs to be loaded. So replenish_program_cache() does its normal thing but uses an account read which does not insert entries in the accounts-db.

Still leaves us with the original issue of instruction accounts in CPI that this PR is trying to solve.

in the filter build the list of program accounts

Happens in phase 2.

and return a value to signal to the accounts-db to not load nor insert an account into its caches.

Happens in phase 3.

Another thing I noticed while tinkering on this is that account_shared_data_from_program() (which is used in load_transaction_accounts() and load_accounts()) does currently require program_accounts to resolve the account owner. That is for programs which are in the global program cache as opposed to the proposed phase 2, where we need the owner of accounts which are not in the global program cache. It can be solved by refactoring the LoadedProgram entries to split LoadedProgramType and the account owner (which is cleaner anyway) and use that to get the owner, instead of asking the accounts-db.

seanyoung · 2024-03-27T14:55:11Z

At that point we are left with a list that needs to be loaded. So replenish_program_cache() does its normal thing but uses a uncached account read.

It is not clear to me:

fn load_program_accounts() uses callbacks.get_account_shared_data(pubkey), which is just a regular account read. How can we make this a "do not keep this in the cache" type of read?
For program account which appear as instruction accounts, we do want the account to be cached, else we have to read them again
Calling account_matches_owners() when we are going to call get_account_shared_data() anyway seems a waste of cycles and iops.

Lichtso · 2024-03-27T18:01:13Z

fn load_program_accounts() uses callbacks.get_account_shared_data(pubkey), which is just a regular account read. How can we make this a "do not keep this in the cache" type of read?

Have to check the interfaces if such a method exists already, don't know.

For program account which appear as instruction accounts, we do want the account to be cached, else we have to read them again.

load_program_accounts() happens after load_accounts() so by that time the instruction accounts will already by cached anyway.

Calling account_matches_owners() when we are going to call get_account_shared_data() anyway seems a waste of cycles and iops.

Agreed, we can also check the owner of the loaded accounts.

alessandrod · 2024-03-27T23:47:10Z

I think we have roughly the same thing in mind. What I suggest is three phases:

I'm not sure that they are the same. Can you elaborate what's the advantage of doing this - which requires account_matches_owners which is inherently "look before you leap" - vs something like load_accounts_with_filter(), which would allow account_matches_owners to be removed altogether and do only one trip inside the accounts db, reducing locks, index lookups etc?

if you do if !account_matches_owners(key, owners) { load_account(key) } that's more work than if you do load_accounts(account_keys, filter) and filter() is invoked with the account. First case you have to find the account twice, second case once.

Lichtso · 2024-03-28T00:14:53Z

My proposal does not need to use account_matches_owners(), that part is orthogonal.
I was more concerned about not touching the accounts-db at all for programs we have in the global cache, while you are aiming to touch each account only once, if we have to touch it. And both things can be combined.

What exactly is the purpose of the filter callback you have in mind, and why wouldn't we just hard-code a check against PROGRAM_OWNERS instead?

mergify · 2024-12-30T04:31:10Z

The Firedancer team maintains a line-for-line reimplementation of the
native programs, and until native programs are moved to BPF, those
implementations must exactly match their Agave counterparts.
If this PR represents a change to a native program implementation (not
tests), please include a reviewer from the Firedancer team. And please
keep refactors to a minimum.

seanyoung force-pushed the dont-load-executables branch 6 times, most recently from fd06abf to e834852 Compare March 14, 2024 18:50

seanyoung force-pushed the dont-load-executables branch 2 times, most recently from b162f7b to 2016e3a Compare March 18, 2024 09:11

seanyoung requested a review from Lichtso March 18, 2024 09:19

seanyoung marked this pull request as ready for review March 18, 2024 09:19

seanyoung force-pushed the dont-load-executables branch 2 times, most recently from 7175ff1 to 45a967b Compare March 21, 2024 10:58

don't load executable accounts

5c34e3c

seanyoung force-pushed the dont-load-executables branch from 45a967b to 5c34e3c Compare March 22, 2024 09:05

t-nelson reviewed Mar 27, 2024

View reviewed changes

alessandrod mentioned this pull request Mar 29, 2024

disable read cache while populating stakes cache on load #482

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't load executable accounts for transactions #180

Don't load executable accounts for transactions #180

seanyoung commented Mar 11, 2024 •

edited

Loading

codecov-commenter commented Mar 14, 2024 •

edited

Loading

Lichtso commented Mar 20, 2024

seanyoung commented Mar 21, 2024

alessandrod commented Mar 26, 2024 •

edited

Loading

alessandrod commented Mar 26, 2024

t-nelson left a comment

t-nelson Mar 27, 2024

Lichtso commented Mar 27, 2024

alessandrod commented Mar 27, 2024

Lichtso commented Mar 27, 2024 •

edited

Loading

seanyoung commented Mar 27, 2024 •

edited

Loading

Lichtso commented Mar 27, 2024

alessandrod commented Mar 27, 2024

Lichtso commented Mar 28, 2024

mergify bot commented Dec 30, 2024

Don't load executable accounts for transactions #180

Are you sure you want to change the base?

Don't load executable accounts for transactions #180

Conversation

seanyoung commented Mar 11, 2024 • edited Loading

Problem

Summary of Changes

codecov-commenter commented Mar 14, 2024 • edited Loading

Codecov Report

Lichtso commented Mar 20, 2024

seanyoung commented Mar 21, 2024

alessandrod commented Mar 26, 2024 • edited Loading

alessandrod commented Mar 26, 2024

t-nelson left a comment

Choose a reason for hiding this comment

t-nelson Mar 27, 2024

Choose a reason for hiding this comment

Lichtso commented Mar 27, 2024

alessandrod commented Mar 27, 2024

Lichtso commented Mar 27, 2024 • edited Loading

seanyoung commented Mar 27, 2024 • edited Loading

Lichtso commented Mar 27, 2024

alessandrod commented Mar 27, 2024

Lichtso commented Mar 28, 2024

mergify bot commented Dec 30, 2024

seanyoung commented Mar 11, 2024 •

edited

Loading

codecov-commenter commented Mar 14, 2024 •

edited

Loading

alessandrod commented Mar 26, 2024 •

edited

Loading

Lichtso commented Mar 27, 2024 •

edited

Loading

seanyoung commented Mar 27, 2024 •

edited

Loading