Fix accounts index generation at startup when we don't use disk index #4018

HaoranYi · 2024-12-09T16:14:20Z

Problem

During account's index generation at startup, one of the important steps is to find duplicated pubkeys and send them to clean.

However, the code path that finds duplicates at index generation, when disk index is disabled, is incorrect. And we are are not finding all the duplicates. This makes clean unable to clean old storages, and results in continue growing of account maps.

Note this only affects when the validator running with disk index disabled.

Summary of Changes

Fix finding duplicated (pubkey, slots) at start up when we disable disk index.

Fixes #

HaoranYi · 2024-12-10T19:13:21Z

Graph of account's map before the fix.

HaoranYi · 2024-12-10T19:14:09Z

Graph of account's map after the fix.

brooksprumo · 2024-12-10T21:52:59Z

Can you set the ACCOUNTS_INDEX_CONFIG_FOR_TESTING.index_limit_mb = IndexLimitMb::InMemOnly, and run the solana_runtime and solana_accounts_db tests?

HaoranYi · 2024-12-10T22:21:19Z

Can you set the ACCOUNTS_INDEX_CONFIG_FOR_TESTING.index_limit_mb = IndexLimitMb::InMemOnly, and run the solana_runtime and solana_accounts_db tests?

Both tests passed.

test result: ok. 538 passed; 0 failed; 4 ignored; 0 measured; 0 filtered out; finished in 116.30s

     Running tests/accounts.rs (/home/sol/src/agave/target/debug/deps/accounts-06e1f00a7377d075)

running 2 tests
test test_shrink_and_clean ... ok
test test_bad_bank_hash ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 24.62s

     Running tests/stake.rs (/home/sol/src/agave/target/debug/deps/stake-c3f800f796ce426c)

running 4 tests
test test_stake_create_and_split_single_signature ... ok
test test_stake_create_and_split_to_existing_system_account ... ok
test test_create_stake_account_from_seed ... ok
test test_stake_account_lifetime ... ok

test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.92s

   Doc-tests solana_runtime

running 1 test
test runtime/src/bank/builtins/core_bpf_migration/mod.rs - bank::builtins::core_bpf_migration::Bank::upgrade_core_bpf_program (line 287) ... ignored

test result: ok. 0 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 0.00s

sol@haoran-dev-ny:~/src/agave/accounts-db$ tail -n 20 accounts_db_test.log
test accounts_index::tests::test_new_entry_and_update_code_paths ... ok
test accounts_db::tests::test_shrink_ancient_overflow_with_min_size ... ok
test shared_buffer_reader::tests::test_shared_buffer_sweep ... ok
test ancient_append_vecs::tests::test_write_ancient_accounts ... ok
test ancient_append_vecs::tests::test_clear_should_shrink_after_cutoff_simple ... ok
test ancient_append_vecs::tests::test_calc_accounts_to_combine_many_refs ... ok
test ancient_append_vecs::tests::test_calc_accounts_to_combine_simple ... ok
test accounts_db::tests::test_shrink_collect_simple ... ok
test storable_accounts::tests::test_storable_accounts_by_slot ... ok
test accounts_db::scan_account_storage::tests::test_accountsdb_scan_snapshot_stores_binning::append_vec ... ok
test accounts_db::scan_account_storage::tests::test_accountsdb_scan_snapshot_stores_binning::hot_storage ... ok

test result: ok. 599 passed; 0 failed; 2 ignored; 0 measured; 0 filtered out; finished in 31.72s

brooksprumo

One real question, one double check, and some nits

accounts-db/src/accounts_db.rs

accounts-db/src/accounts_index/in_mem_accounts_index.rs

accounts-db/src/accounts_index.rs

accounts-db/src/accounts_index/in_mem_accounts_index.rs

accounts-db/src/accounts_index.rs

HaoranYi · 2024-12-12T15:55:11Z

Rebase to master to pickup #4069

brooksprumo

Thanks for fixing this!

jeffwashington · 2024-12-12T20:10:29Z

is this change necessary since this just got merged?
#4044

jeffwashington · 2024-12-12T20:11:00Z

is this change necessary since this just got merged? #4044

it seems like we could simplify both disk and in-mem index startup code if we just simply identify the pubkeys that are dups. We no longer have to identify them by slot, I don't think.

HaoranYi · 2024-12-12T20:16:58Z

is this change necessary since this just got merged? #4044

it seems like we could simplify both disk and in-mem index startup code if we just simply identify the pubkeys that are dups. We no longer have to identify them by slot, I don't think.

We still need to find all duplicated (key, slot) correctly. Duplicates are used for other things than clean too. Account data length and lattice hash both depends on finding the correct set of duplicated (key, slot) at startup.

HaoranYi · 2024-12-12T20:19:18Z

With #4044 backported, we probably don't need to backport this one.

Account data len may be wrong for no-disk. But that's just used in a metrics. So should be fine to let it be wrong.

But we still need this in master for lattic hash in master.

jeffwashington · 2024-12-12T20:35:35Z

accounts-db/src/accounts_index/in_mem_accounts_index.rs

+                // this for slot_list.len() == 1. For slot_list.len() > 1, the
+                // items, previously inserted into the slot_list, have already
+                // been added. We don't need to add them again.
+                if slot_list.len() == 1 {


there is a race here. 2 competing insert threads here could both find slot_list.len() = 1. Then, they serilialize access with the write lock on lock_and_update_slot_list so that one makes len 2, the next 3. I don't think this is bad in this case, but it is a race condition. lock_and_update_slot_list returns slot list len after insertion using write lock. We could set other_slot to None if lock_and_update_slot_list returns != 2, which means someone else won the race.

yeah. fixed in ef3e873.

HaoranYi · 2024-12-12T21:59:52Z

it seems like we could simplify both disk and in-mem index startup code if we just simply identify the pubkeys that are dups. We no longer have to identify them by slot, I don't think.

We still need to populate (pubkey, slot) at startup.

This is because at startup, we use uncleaned_pubkeys to drive clean. And uncleaned_pubkeys require slot.

agave/accounts-db/src/accounts_db.rs

Line 1557 in d11072e

uncleaned_pubkeys: DashMap<Slot, Vec<Pubkey>>,

And the (pubkey, slot) is used here.

agave/accounts-db/src/accounts_db.rs

Line 8786 in d11072e

self.uncleaned_pubkeys.entry(slot).or_default().push(key);

Yes, with #4044 landed. We can

get rid of uncleaned_root from clean.
don't populate uncleaned_roots at startup.

But these works are out of scope of this PR.

I have two other follow up PRs for them #4095 and #4092.

HaoranYi marked this pull request as draft December 9, 2024 16:17

HaoranYi force-pushed the fix_index_gen_no_disk branch from 067ce54 to 0a57a47 Compare December 10, 2024 18:58

HaoranYi changed the title ~~fix index gen no disk~~ Fix accounts index generation at startup when we don't use disk index Dec 10, 2024

HaoranYi marked this pull request as ready for review December 10, 2024 19:18

HaoranYi requested review from brooksprumo and jeffwashington December 10, 2024 21:17

brooksprumo reviewed Dec 10, 2024

View reviewed changes

HaoranYi requested a review from brooksprumo December 11, 2024 15:39

brooksprumo reviewed Dec 11, 2024

View reviewed changes

accounts-db/src/accounts_index/in_mem_accounts_index.rs Outdated Show resolved Hide resolved

accounts-db/src/accounts_index/in_mem_accounts_index.rs Outdated Show resolved Hide resolved

brooksprumo self-requested a review December 11, 2024 18:29

brooksprumo reviewed Dec 11, 2024

View reviewed changes

accounts-db/src/accounts_index.rs Show resolved Hide resolved

brooksprumo self-requested a review December 11, 2024 20:44

HaoranYi force-pushed the fix_index_gen_no_disk branch 2 times, most recently from 8ca8733 to 819f026 Compare December 12, 2024 15:53

brooksprumo previously approved these changes Dec 12, 2024

View reviewed changes

jeffwashington reviewed Dec 12, 2024

View reviewed changes

HaoranYi dismissed brooksprumo’s stale review via ef3e873 December 12, 2024 21:37

HaoranYi added 4 commits December 12, 2024 21:53

fix in-memory startup indexgen

8b61a8e

merge fix

95dceea

pr: rename

88d0ba3

pr: reanme

b70f6ab

HaoranYi added 7 commits December 12, 2024 21:53

pr: rename

f15e6c2

pr: rename

237bca4

pr

00163ed

pr

7934ef2

typo

15d6e13

pr: add comments

ce25543

pr: fix a race

482486f

HaoranYi marked this pull request as draft December 12, 2024 22:00

HaoranYi force-pushed the fix_index_gen_no_disk branch from ef3e873 to 482486f Compare December 13, 2024 00:01

mergify bot mentioned this pull request Dec 13, 2024

Don't populate uncleaned_roots at index generation in startup. #4095

Open

HaoranYi marked this pull request as ready for review December 13, 2024 00:50

HaoranYi requested review from brooksprumo and jeffwashington December 13, 2024 01:01

brooksprumo approved these changes Dec 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix accounts index generation at startup when we don't use disk index #4018

Fix accounts index generation at startup when we don't use disk index #4018

HaoranYi commented Dec 9, 2024 •

edited

Loading

HaoranYi commented Dec 10, 2024

HaoranYi commented Dec 10, 2024

brooksprumo commented Dec 10, 2024

HaoranYi commented Dec 10, 2024

brooksprumo left a comment

HaoranYi commented Dec 12, 2024

brooksprumo left a comment

jeffwashington commented Dec 12, 2024

jeffwashington commented Dec 12, 2024

HaoranYi commented Dec 12, 2024 •

edited

Loading

HaoranYi commented Dec 12, 2024 •

edited

Loading

jeffwashington Dec 12, 2024

HaoranYi Dec 12, 2024

HaoranYi commented Dec 12, 2024 •

edited

Loading

Fix accounts index generation at startup when we don't use disk index #4018

Are you sure you want to change the base?

Fix accounts index generation at startup when we don't use disk index #4018

Conversation

HaoranYi commented Dec 9, 2024 • edited Loading

Problem

Summary of Changes

HaoranYi commented Dec 10, 2024

HaoranYi commented Dec 10, 2024

brooksprumo commented Dec 10, 2024

HaoranYi commented Dec 10, 2024

brooksprumo left a comment

Choose a reason for hiding this comment

HaoranYi commented Dec 12, 2024

brooksprumo left a comment

Choose a reason for hiding this comment

jeffwashington commented Dec 12, 2024

jeffwashington commented Dec 12, 2024

HaoranYi commented Dec 12, 2024 • edited Loading

HaoranYi commented Dec 12, 2024 • edited Loading

jeffwashington Dec 12, 2024

Choose a reason for hiding this comment

HaoranYi Dec 12, 2024

Choose a reason for hiding this comment

HaoranYi commented Dec 12, 2024 • edited Loading

HaoranYi commented Dec 9, 2024 •

edited

Loading

HaoranYi commented Dec 12, 2024 •

edited

Loading

HaoranYi commented Dec 12, 2024 •

edited

Loading

HaoranYi commented Dec 12, 2024 •

edited

Loading