Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validator: Split PoH speed measurement from check #4185

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

steviez
Copy link

@steviez steviez commented Dec 19, 2024

Problem

Checking the PoH speed requires a Bank to derive the cluster hash rate as of #2447. By the time a Bank is available, many services have started up and are competing for CPU time.

Summary of Changes

So, split the PoH speed check. Measure the speed very early on in Validator::new(), and check the value later once a Bank is available

Without the change:

[...] PoH speed check: computed hashes per second 12663376, target hashes per second 10000000
[...] PoH speed check: computed hashes per second 15573125, target hashes per second 10000000
[...] PoH speed check: computed hashes per second 15448481, target hashes per second 10000000
[...] PoH speed check: computed hashes per second 11727495, target hashes per second 10000000
[...] PoH speed check: computed hashes per second 15355731, target hashes per second 10000000
[...] PoH speed check: computed hashes per second 15276260, target hashes per second 10000000
[...] PoH speed check: computed hashes per second 12747872, target hashes per second 10000000
[...] PoH speed check: computed hashes per second 15319899, target hashes per second 10000000
[...] PoH speed check: computed hashes per second 15462806, target hashes per second 10000000
[...] PoH speed check: computed hashes per second 15666179, target hashes per second 10000000

With the change:

[...] PoH speed check: computed hashes per second 15842486, target hashes per second 10000000
[...] PoH speed check: computed hashes per second 15763624, target hashes per second 10000000
[...] PoH speed check: computed hashes per second 15841138, target hashes per second 10000000
[...] PoH speed check: computed hashes per second 15354805, target hashes per second 10000000
[...] PoH speed check: computed hashes per second 15850004, target hashes per second 10000000
[...] PoH speed check: computed hashes per second 15839109, target hashes per second 10000000
[...] PoH speed check: computed hashes per second 15842484, target hashes per second 10000000
[...] PoH speed check: computed hashes per second 15676336, target hashes per second 10000000
[...] PoH speed check: computed hashes per second 15790533, target hashes per second 10000000
[...] PoH speed check: computed hashes per second 15844216, target hashes per second 10000000

As you can see, a lot more variation without this change

@steviez steviez force-pushed the val_split_poh_speed_check branch from e851f08 to be4ac16 Compare December 19, 2024 20:10
Checking the PoH speed requires a Bank to derive the cluster hash rate.
By the time a Bank is available, many services have started up and are
competing for CPU time.

So, split the PoH speed check. Measure the speed very early on in
Validator::new(), and check the value later once a Bank is available
@steviez steviez force-pushed the val_split_poh_speed_check branch from be4ac16 to 3d1b6b2 Compare February 3, 2025 22:24
@steviez
Copy link
Author

steviez commented Feb 3, 2025

The change shows clear improvement for production case. However, it is problematic for test situations such as LocalCluster. On a LocalCluster test where we spin up several validators, doing 10M hashes for each of them is non-trivial. Not to mention that the test might not be built in --release mode which means it will crawl to do 10M hashes / validators.

A few alternatives I see:

  • Try to skip the speed check in all test scenarios
    • This seems error prone / we are unlikely to catch them all
  • On top of this PR, unpack genesis and choose how many calculations we'll run depending on genesis.cluster_type
    • Do 10M for MNB/testnet/devnet; use the value from genesis otherwise
    • This will keep the split to measure early on in startup, and compare against Bank later on
  • Open to ideas ..?

I'm leaning towards the option to use 10M for our clusters and fallback to genesis for Development. I don't love the cluster-specific behavior, but don't see a great way around it right now. I'll think on this a little more but I am certainly open to ideas too

@steviez steviez requested review from t-nelson and bw-solana February 3, 2025 22:34
core/src/validator.rs Outdated Show resolved Hide resolved
// If the hash rate on these clusters changes, we might consider updating this
// constant. However, the PoH speed check compares hashes / second so this
// constant does not have to be updated
const POH_SPEED_CHECK_NUM_HASHES: u64 = 10_000_000;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping this in sync seems a little fraught. if we don't really care about the cluster value(s) and we're out of the critical path, would it make sense just just choose a much larger number so that we get a more accurate measure?

might make sense to eventually stuff this into a separate snapshot field so we can peek it out instead of waiting for a bank

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A much larger number would hypothetically get us a more accurate number at the cost of time spent performing the calculation. Ie, a node that just meets spec would spent 1s hashing; 10x'ing this number 10x's that time spent (10s or 25 slots).

I think the main point of my comment was that since we normalize to hashes per second, keeping the number up to date isn't critical. Maybe the consideration would be if we hit 100M hashes/second then it 10M hashes (0.1) is too short of a period and we see significant jitter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on the issue I called out in #4185 (comment)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we reference UPDATED_HASHES_PER_SECOND_6? https://github.com/anza-xyz/agave/blob/master/sdk/clock/src/lib.rs#L61

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh yeah, and that is a public constant; I'll use that directly

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPDATED_HASHES_PER_SECOND_6 is a cluster configuration parameter and should by no means be public in the first place despite our rampant abuse of similar patterns. please do not add more cases that we need to fix

Any thoughts on the issue I called out in #4185 (comment)

yeah. i think local cluster is a scourge and needs to die as soon as possible. milady

A much larger number would hypothetically get us a more accurate number at the cost of time spent performing the calculation. Ie, a node that just meets spec would spent 1s hashing; 10x'ing this number 10x's that time spent (10s or 25 slots).

I think the main point of my comment was that since we normalize to hashes per second, keeping the number up to date isn't critical. Maybe the consideration would be if we hit 100M hashes/second then it 10M hashes (0.1) is too short of a period and we see significant jitter.

i guess what i was getting at is that the comment is kinda ambivalent and like a "oh yeah hey maybe keep an eye on this" whereas ideally it'd either be direct and enforced with a static assertion or we choose a value that's sufficiently large to get us a good hashes/second measurement, regardless the cluster config

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally it'd either be direct and enforced with a static assertion

Agreed, although in our case, the "correct" value is pulled from a Bank at runtime. Nothing we can check at compile time so not much we can do here

or we choose a value that's sufficiently large to get us a good hashes/second measurement, regardless the cluster config

Anecdotally, I think the max value I've seen in Discord is something like 20M-25M hashes per second. Assuming 25M hashes/second became the new cluster hash rate, 10M hashes is still 1 slot worth which is what the test previously ran with. So, I think the 10M hashes is a reasonable balance of do-not-spend-too-much-time and number-that-is-somewhat-future-proof (on the order of years given recent trajectory)

@bw-solana
Copy link

I'm fine just skipping the PoH check for Development cluster type. Half-assing it for these cluster types seems like we will either have to set the bar low enough that it doesn't catch anything or it will be flaky. Better to go full-ass or no-ass

I have mixed thoughts on the overall decoupling...

  • On the one hand, you get much more consistent results measuring PoH before the other services start up.
  • On the other hand, you'll have to actually complete the hashes while other services are running, so your performance in that environment is more important.
  • Isn't PoH supposed to pin a core? So why is there so much variability?

@steviez
Copy link
Author

steviez commented Feb 4, 2025

  • On the other hand, you'll have to actually complete the hashes while other services are running, so your performance in that environment is more important.
  • Isn't PoH supposed to pin a core? So why is there so much variability?

PohService does run when everything is turned on AND does pin to a core; the speed check does not. Moving the speed check to happen inside PohService would butcher the API a little, since the pinning occurs in a separate thread. We'd need a channel or something to get the hash rate out.

We could potentially adjust the speed check to create a new thread, set affinity, run speed check, and immediately join on that thread. This would happen after the Bank has been unpacked, so we could use the "correct" value derived from a Bank again.

I don't believe that pinning this short-lived PoH speed check thread to the same core as PohService would have any persisting effects on PohService. That is, pinning a thread that exits early shouldn't inhibit the other thread that we pin in the same way

@bw-solana
Copy link

  • On the other hand, you'll have to actually complete the hashes while other services are running, so your performance in that environment is more important.
  • Isn't PoH supposed to pin a core? So why is there so much variability?

PohService does run when everything is turned on AND does pin to a core; the speed check does not. Moving the speed check to happen inside PohService would butcher the API a little, since the pinning occurs in a separate thread. We'd need a channel or something to get the hash rate out.

We could potentially adjust the speed check to create a new thread, set affinity, run speed check, and immediately join on that thread. This would happen after the Bank has been unpacked, so we could use the "correct" value derived from a Bank again.

I don't believe that pinning this short-lived PoH speed check thread to the same core as PohService would have any persisting effects on PohService. That is, pinning a thread that exits early shouldn't inhibit the other thread that we pin in the same way

I see, makes sense. In this way, the change to run before other services start probably provides a hash rate more reflective of reality (when PoH core is pinned), right?

A comment by the PoH speed check summarizing this reasoning would probably be helpful.

@steviez
Copy link
Author

steviez commented Feb 4, 2025

I see, makes sense. In this way, the change to run before other services start probably provides a hash rate more reflective of reality (when PoH core is pinned), right?

Correct; there should be little to no "competition" for CPU time since there shouldn't be a glob of threads created yet.

A comment by the PoH speed check summarizing this reasoning would probably be helpful.

I had a comment with the intent to say this, but I apparently didn't finish writing it 😅 :

// Run the speed check as early as possible to

Assuming I finish the comment and given what you know now, are you still in favor of the split approach with the noop for cluster_type == Development ?

@bw-solana
Copy link

Assuming I finish the comment and given what you know now, are you still in favor of the split approach with the noop for cluster_type == Development ?

Yes 👍

Copy link
Author

@steviez steviez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a comment about why the measure/check is split in the comments for check_poh_speed(); but I decided to include a short note about where we actually call measure() in Validator::new(). The extra visibility will hopefully prevent others from shuffling it around

core/src/validator.rs Outdated Show resolved Hide resolved
Copy link

@bw-solana bw-solana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Called out a few minor things

// If the hash rate on these clusters changes, we might consider updating this
// constant. However, the PoH speed check compares hashes / second so this
// constant does not have to be updated
const POH_SPEED_CHECK_NUM_HASHES: u64 = 10_000_000;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we reference UPDATED_HASHES_PER_SECOND_6? https://github.com/anza-xyz/agave/blob/master/sdk/clock/src/lib.rs#L61

//
// Skip the check for Development clusters as this will be the type for
// tests or for clusters that we don't know what the rate will be
// without a rebuilt Bank

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the real reason that we don't want to pay the time cost for local cluster tests and such?
Because we end up pulling the rate from the root bank later

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of, a couple things a

  • Unit tests are run in debug mode ... doing 10M hashes in debug mode on local machine, potentially for multiple tests in parallel is not a good time
  • You're correct that we pull the value from the Bank later, and because of that, we don't necessarily want to spend the time to hash 10M hashes (in hopefully a second) if the genesis config dictates we only need to be able to hit 100k.

I can look at tweaking the comment a bit more

// tests or for clusters that we don't know what the rate will be
// without a rebuilt Bank
let my_poh_hashes_per_second =
match (genesis_config.cluster_type, !config.no_poh_speed_test) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matching !no with false arm breaks my brain

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, I initially did the following but cleared it

let do_speed_test = !config.no_speed_test
let my_poh_hashes_per_second = match ...

Naming a separate variable makes the not more visible so can switch it back if that's agreeable

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha, I'm cool with whatever

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants