-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix accounts index generation at startup when we don't use disk index #4018
base: master
Are you sure you want to change the base?
Conversation
067ce54
to
0a57a47
Compare
Can you set the |
Both tests passed.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One real question, one double check, and some nits
8ca8733
to
819f026
Compare
Rebase to master to pickup #4069 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this!
is this change necessary since this just got merged? |
it seems like we could simplify both disk and in-mem index startup code if we just simply identify the pubkeys that are dups. We no longer have to identify them by slot, I don't think. |
We still need to find all duplicated (key, slot) correctly. Duplicates are used for other things than clean too. Account data length and lattice hash both depends on finding the correct set of duplicated (key, slot) at startup. |
With #4044 backported, we probably don't need to backport this one. Account data len may be wrong for no-disk. But that's just used in a metrics. So should be fine to let it be wrong. But we still need this in master for lattic hash in master. |
// this for slot_list.len() == 1. For slot_list.len() > 1, the | ||
// items, previously inserted into the slot_list, have already | ||
// been added. We don't need to add them again. | ||
if slot_list.len() == 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a race here. 2 competing insert threads here could both find slot_list.len() = 1. Then, they serilialize access with the write lock on lock_and_update_slot_list
so that one makes len 2, the next 3. I don't think this is bad in this case, but it is a race condition. lock_and_update_slot_list
returns slot list len after insertion using write lock. We could set other_slot
to None
if lock_and_update_slot_list
returns != 2
, which means someone else won the race.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. fixed in ef3e873.
We still need to populate (pubkey, slot) at startup. This is because at startup, we use agave/accounts-db/src/accounts_db.rs Line 1557 in d11072e
And the (pubkey, slot) is used here. agave/accounts-db/src/accounts_db.rs Line 8786 in d11072e
Yes, with #4044 landed. We can
But these works are out of scope of this PR. |
ef3e873
to
482486f
Compare
Problem
During account's index generation at startup, one of the important steps is to find duplicated
pubkeys
and send them toclean
.However, the code path that finds duplicates at index generation, when disk index is disabled, is incorrect. And we are are not finding all the duplicates. This makes clean unable to clean old storages, and results in continue growing of account maps.
Note this only affects when the validator running with disk index disabled.
Summary of Changes
Fix finding duplicated (pubkey, slots) at start up when we disable disk index.
Fixes #