Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add uncleaned roots when we shrink away non-zero lamport accounts #2824

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jeffwashington
Copy link

Problem

Some storages are living forever with only zero-lamport accounts in them.

Summary of Changes

When we shrink away dead accounts whose only alive account is zero lamport, we mark the alive slot as needing to be clean.

Fixes #


// If we are marking something dead, and the only remaining alive account is zero lamport, then make that zero lamport slot ready to be cleaned.
// If that slot happens to only contain zero lamport accounts, the whole slot will go away
if slot_list.len() == 1 // should we also check for ref counts here?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the slot_list contains more than one zero lamport accounts?
Can we relax the requirement for slot_list.len() ==1?

@jeffwashington
Copy link
Author

I think this is correct. It appears to mark about 2k slots per minute as needing to be cleaned that otherwise didn't get visited by clean. That could be more and more expensive. But, it is correct. We are leaking old storages. But, ancient packing does clean those up correctly.


// If we are marking something dead, and the only remaining alive account is zero lamport, then make that zero lamport slot ready to be cleaned.
// If that slot happens to only contain zero lamport accounts, the whole slot will go away
if slot_list.len() == 1 // should we also check for ref counts here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about not checking the slot list len, but instead seeing if the latest slot list entry's account info is zero lamports? (Maybe we'd also need to bound the latest entry to be a storage and not a cached entry?)

@jeffwashington
Copy link
Author

to be clear, when we eliminate rewrites and we start ancient packing (and shrinking), we will eliminate these orphan storages once they become ancient. They will be small, so they will be quickly chosen for packing. When we pack them, we will identify them as being zero lamport, single ref accounts, so they will be deleted then.

@jeffwashington
Copy link
Author

This pr appears to work correctly. A validator DOING rewrites and running for more than an epoch has accumulated no ancient slots and has not triggered any ancient packs. We might want to tweak the slot list or ref count checks above, but this idea is successful.

@jeffwashington
Copy link
Author

this pr may couple well with a change @brooksprumo has on the radar of keeping track of how many zero lamport single ref accounts exist in a storage. When we know alive bytes + zero lamport single ref accounts, we can know whether calling clean will be able to mark the whole slot as dead or not and whether shrink would pass the threshold to get rid of the dead and zero lamport single ref slots. The idea is add a HashSet of zero lamport single refs we've identified in this storage. Given the len of that hashset * 136, we can know how many bytes would be dead and how many accounts would be dead. This info would be used to determine whether to shrink and whether the entire slot would be dead if we were to clean or shrink it.

@jeffwashington
Copy link
Author

and with this info on zero lamport single refs, we would greatly reduce the biggest downside to this pr, which is the 2k storages per call that we're now having to clean. This is largely wasted work to just pick up a few stragglers.


// If we are marking something dead, and the only remaining alive account is zero lamport, then make that zero lamport slot ready to be cleaned.
// If that slot happens to only contain zero lamport accounts, the whole slot will go away
if slot_list.len() == 1 // should we also check for ref counts here?
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should check for ref count. However, we should check after we have unref'd this one. There can be 3 refs, 1 alive that is zero lamport. Each time we unref a pubkey in shrink where the alive one is dead, we should check if refcount has become 1. If so, we know this account can be marked dead and shrunk or entire slot cleaned.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, see:
4bb2d91

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants