-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add benchmark for the number of minimum cpu cores #5127
Conversation
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
The CI pipeline was cancelled due to failure one of the required jobs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A PRDoc is needed for operators. Otherwise LGTM!
I ran this 5 times on ref hardware and it is very consistent on 1022 MiBs for the new BLAKE2 parallel metric. Kind of expected on a homogeneous system 😆 |
You mean our current ref hardware it is not the same as the ones we generated
Yeah, I concur with you that we probably don't want to change the values for existing benchmarks, although I'm not sure where should I generate the new one I want to add. |
Signed-off-by: Alexandru Gheorghe <[email protected]>
Yea. We migrated the server in the meantime. But the definition of the ref HW in the wiki is still the same, so i think we should keep the values.
Given that the multi core score was pretty much identical to the single thread score on the ref hardware, i think its fine to use the same value in the JSON config file. Any concerns about that? |
Yeah, that should be fine as well, although I will try to see if I can get my hands on this type of machine. |
You can request access by opening an issue here: https://github.com/paritytech/devops/issues. This is the machien: https://github.com/paritytech/devops/pull/3210 |
Signed-off-by: Alexandru Gheorghe <[email protected]>
Did some measurements and spelunking on the following hardware-types and what I can tell is that the speed-up in benchmark comes actually from the fact that we introduced SIMD optimisations on the benchmarked functions here
Master
Without SIMD optimisations
Master
Without SIMD optimizations
ConsequencesEvery weight update we did after Blake2 (March 2023) got merged gets the 20-30% cpu speed up, but every validator using the benchmarks to determine if their validator is in parameters gets a false What next ?From my perspective we have a few options here:
@ggwpez @koute @PierreBesson, thoughts ? |
Then i think we should bump the numbers. Otherwise we silently and accidentally reduced the single core requirements by merging these dependency updates and not updating them. Good find, thanks for investigating! |
Signed-off-by: Alexandru Gheorghe <[email protected]>
For transparency, CI is still using GCP machines and we are not planning to change it at least until we finish the ci migration. |
@alvicsam These #5196 updated numbers are pretty similar between GCP and scaleway, overall the conclusion is that the speedup did not came from changing cloud providers, but simply from the optimisations that the code suffered since the reference numbers were last updated. |
…5196) Since `May 2023` after paritytech/substrate#13548 optimization, `Blake2256` is faster with about 30%, that means that there is a difference of ~30% between the benchmark values we ask validators to run against and the machine we use for generating the weights.So if all validators, just barely pass the benchmarks our weights are potentially underestimated with about ~20%, so let's bring this two in sync. Same thing happened when we merged #2524 in `Nov 2023` SR25519-Verify became faster with about 10-15% ## Results Generated on machine from here: paritytech/devops#3210 ``` +----------+----------------+--------------+-------------+-------------------+ | Category | Function | Score | Minimum | Result | +============================================================================+ | CPU | BLAKE2-256 | 1.00 GiBs | 783.27 MiBs | ✅ Pass (130.7 %) | |----------+----------------+--------------+-------------+-------------------| | CPU | SR25519-Verify | 637.62 KiBs | 560.67 KiBs | ✅ Pass (113.7 %) | |----------+----------------+--------------+-------------+-------------------| | Memory | Copy | 12.19 GiBs | 11.49 GiBs | ✅ Pass (106.1 %) | ``` Discovered and discussed here: #5127 (comment) ## Downsides Machines that barely passed the benchmark will suddenly find themselves bellow the benchmark, but since that is just an warning and everything else continues as before it shouldn't be too impactful and should give the validators the necessary information that they need to become compliant, since they actually aren't when compared with the used weights. --------- Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to approve now so feel free to merge.
Thanks, my plan is to merge this only after https://polkadot.subsquare.io/referenda/1051 passes and I do the necessary updates in: https://wiki.polkadot.network/docs/maintain-guides-how-to-validate-polkadot#reference-hardware. |
…aritytech#5196) Since `May 2023` after paritytech/substrate#13548 optimization, `Blake2256` is faster with about 30%, that means that there is a difference of ~30% between the benchmark values we ask validators to run against and the machine we use for generating the weights.So if all validators, just barely pass the benchmarks our weights are potentially underestimated with about ~20%, so let's bring this two in sync. Same thing happened when we merged paritytech#2524 in `Nov 2023` SR25519-Verify became faster with about 10-15% ## Results Generated on machine from here: https://github.com/paritytech/devops/pull/3210 ``` +----------+----------------+--------------+-------------+-------------------+ | Category | Function | Score | Minimum | Result | +============================================================================+ | CPU | BLAKE2-256 | 1.00 GiBs | 783.27 MiBs | ✅ Pass (130.7 %) | |----------+----------------+--------------+-------------+-------------------| | CPU | SR25519-Verify | 637.62 KiBs | 560.67 KiBs | ✅ Pass (113.7 %) | |----------+----------------+--------------+-------------+-------------------| | Memory | Copy | 12.19 GiBs | 11.49 GiBs | ✅ Pass (106.1 %) | ``` Discovered and discussed here: paritytech#5127 (comment) ## Downsides Machines that barely passed the benchmark will suddenly find themselves bellow the benchmark, but since that is just an warning and everything else continues as before it shouldn't be too impactful and should give the validators the necessary information that they need to become compliant, since they actually aren't when compared with the used weights. --------- Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
substrate/utils/frame/benchmarking-cli/src/machine/reference_hardware.json
Outdated
Show resolved
Hide resolved
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Referenda passed: https://polkadot.subsquare.io/referenda/1051, wiki page updated with w3f/polkadot-wiki#6202, merging it ... |
(cherry picked from commit a947cb8)
…5613) This backports #5127, to the stable branch. Unfortunately https://polkadot.subsquare.io/referenda/1051 passed after the cut-off deadline and I missed the window of getting this PR merged. The change itself is super low-risk it just prints a new message to validators that starting with January 2025 the required minimum of hardware cores will be 8, I see value in getting this in front of the validators as soon as possible. Since we did not release things yet and it does not invalidate any QA we already did, it should be painless to include it in the current release. (cherry picked from commit a947cb8)
Fixes: #5122.
This PR extends the existing single core
benchmark_cpu
to also build a score of the entire processor by spawningEXPECTED_NUM_CORES(8)
threads and averaging their throughput.This is better than simply checking the number of cores, because also covers multi-tenant environments where the OS sees a high number of available CPUs, but because it has to share it with the rest of his neighbours its total throughput does not satisfy the minimum requirements.
TODO