-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add uncleaned roots when we shrink away non-zero lamport accounts #2824
base: master
Are you sure you want to change the base?
Conversation
|
||
// If we are marking something dead, and the only remaining alive account is zero lamport, then make that zero lamport slot ready to be cleaned. | ||
// If that slot happens to only contain zero lamport accounts, the whole slot will go away | ||
if slot_list.len() == 1 // should we also check for ref counts here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that the slot_list contains more than one zero lamport accounts?
Can we relax the requirement for slot_list.len() ==1?
I think this is correct. It appears to mark about 2k slots per minute as needing to be cleaned that otherwise didn't get visited by clean. That could be more and more expensive. But, it is correct. We are leaking old storages. But, ancient packing does clean those up correctly. |
|
||
// If we are marking something dead, and the only remaining alive account is zero lamport, then make that zero lamport slot ready to be cleaned. | ||
// If that slot happens to only contain zero lamport accounts, the whole slot will go away | ||
if slot_list.len() == 1 // should we also check for ref counts here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about not checking the slot list len, but instead seeing if the latest slot list entry's account info is zero lamports? (Maybe we'd also need to bound the latest entry to be a storage and not a cached entry?)
to be clear, when we eliminate rewrites and we start ancient packing (and shrinking), we will eliminate these orphan storages once they become ancient. They will be small, so they will be quickly chosen for packing. When we pack them, we will identify them as being zero lamport, single ref accounts, so they will be deleted then. |
This pr appears to work correctly. A validator DOING rewrites and running for more than an epoch has accumulated no ancient slots and has not triggered any ancient packs. We might want to tweak the slot list or ref count checks above, but this idea is successful. |
this pr may couple well with a change @brooksprumo has on the radar of keeping track of how many zero lamport single ref accounts exist in a storage. When we know alive bytes + zero lamport single ref accounts, we can know whether calling clean will be able to mark the whole slot as dead or not and whether shrink would pass the threshold to get rid of the dead and zero lamport single ref slots. The idea is add a HashSet of zero lamport single refs we've identified in this storage. Given the len of that hashset * 136, we can know how many bytes would be dead and how many accounts would be dead. This info would be used to determine whether to shrink and whether the entire slot would be dead if we were to clean or shrink it. |
and with this info on zero lamport single refs, we would greatly reduce the biggest downside to this pr, which is the 2k storages per call that we're now having to clean. This is largely wasted work to just pick up a few stragglers. |
|
||
// If we are marking something dead, and the only remaining alive account is zero lamport, then make that zero lamport slot ready to be cleaned. | ||
// If that slot happens to only contain zero lamport accounts, the whole slot will go away | ||
if slot_list.len() == 1 // should we also check for ref counts here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should check for ref count. However, we should check after we have unref'd this one. There can be 3 refs, 1 alive that is zero lamport. Each time we unref a pubkey in shrink where the alive one is dead, we should check if refcount has become 1. If so, we know this account can be marked dead and shrunk or entire slot cleaned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, see:
4bb2d91
Problem
Some storages are living forever with only zero-lamport accounts in them.
Summary of Changes
When we shrink away dead accounts whose only alive account is zero lamport, we mark the alive slot as needing to be clean.
Fixes #