Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: Generalize propagation benchmark functionality #404

Merged

Conversation

niermann999
Copy link
Contributor

@niermann999 niermann999 commented Mar 6, 2023

Generalize the benchmark functionality to different detectors and actor setups. Also splits some common benchmark functionality into a detray benchmark library, like e.g. the generation of track samples.

The CPU benchmarks have also been switched to use dynamic scheduling for load balancing.

Also adds the charge conjugation operation to the pdg particle, so that the charge hypothesis can be updated when test tracks are generated with randomized charge (this was triggering an assertion in the benchmarks otherwise)

@niermann999 niermann999 added enhancement New feature or request priority: high high priority labels Mar 6, 2023
@niermann999 niermann999 force-pushed the feat-propagation-benchmarks branch from fe6d7da to 5ff2e0e Compare March 15, 2023 00:32
@niermann999 niermann999 added the priority: low Low priority label Apr 10, 2023
@beomki-yeo
Copy link
Collaborator

Is this still in the development? Because I saw openmp macro in the benchmark code

@niermann999
Copy link
Contributor Author

Is this still in the development? Because I saw openmp macro in the benchmark code

True, the openMP part made it into main, but I would like to revisit the refactoring of the benchmark done here

@niermann999 niermann999 force-pushed the feat-propagation-benchmarks branch 7 times, most recently from 441ce27 to 484644b Compare July 15, 2024 15:12
@niermann999 niermann999 changed the title feat: Add cpu propagation benchmarks that use openMP ref: Generalize propagation benchmark functionality Aug 13, 2024
@asalzburger
Copy link
Contributor

Close this one?

@niermann999
Copy link
Contributor Author

Close this one?

Please leave this open, I have been working on this recently to benchmark different scenarios (e.g. with/without covarance transport/material maps). I just need to find a way to integrate it with the recent benchmark monitoring...

@niermann999 niermann999 force-pushed the feat-propagation-benchmarks branch from 484644b to 207bb0d Compare August 22, 2024 08:10
@niermann999 niermann999 force-pushed the feat-propagation-benchmarks branch 2 times, most recently from 5d9b666 to 18ac817 Compare December 6, 2024 18:40
@niermann999 niermann999 marked this pull request as ready for review December 6, 2024 18:40
@niermann999 niermann999 force-pushed the feat-propagation-benchmarks branch 5 times, most recently from 7758f81 to fe1b1c9 Compare December 9, 2024 12:29
@niermann999

This comment was marked as outdated.

@niermann999 niermann999 force-pushed the feat-propagation-benchmarks branch 3 times, most recently from 0206094 to 3eef7e4 Compare December 9, 2024 13:39
@niermann999 niermann999 force-pushed the feat-propagation-benchmarks branch from 3eef7e4 to 52ccd54 Compare December 9, 2024 14:01
@niermann999 niermann999 removed the priority: low Low priority label Dec 9, 2024
@niermann999 niermann999 force-pushed the feat-propagation-benchmarks branch 6 times, most recently from 5c6ac31 to 531da1f Compare December 10, 2024 19:12
@niermann999
Copy link
Contributor Author

niermann999 commented Dec 10, 2024

The GPU benchmarks can now also be run the same way. In particular, with and without covariance transport
GPU (array plugin):

2024-12-10T20:20:52+01:00
Running ./bin/detray_benchmark_cuda_propagation_array
Run on (48 X 2540.18 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x24)
  L1 Instruction 32 KiB (x24)
  L2 Unified 512 KiB (x24)
  L3 Unified 32768 KiB (x4)
Load Average: 0.01, 0.61, 3.24
--------------------------------------------------------------------------------------------------------------------
Benchmark                                                          Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------
BM_PROPAGATION_TOY_DETECTOR_W_COV_TRANSPORT_64_TRACKS        2037113 ns      2033445 ns          326 TracksPropagated=31.4737k/s
BM_PROPAGATION_TOY_DETECTOR_W_COV_TRANSPORT_256_TRACKS       2094531 ns      2090671 ns          335 TracksPropagated=122.449k/s
BM_PROPAGATION_TOY_DETECTOR_W_COV_TRANSPORT_1024_TRACKS      2124243 ns      2120429 ns          330 TracksPropagated=482.921k/s
BM_PROPAGATION_TOY_DETECTOR_W_COV_TRANSPORT_4096_TRACKS      2402214 ns      2397999 ns          293 TracksPropagated=1.70809M/s
BM_PROPAGATION_TOY_DETECTOR_W_COV_TRANSPORT_16384_TRACKS     3218884 ns      3213157 ns          219 TracksPropagated=5.09903M/s
BM_PROPAGATION_TOY_DETECTOR_W_COV_TRANSPORT_65536_TRACKS    11053775 ns     11035153 ns           64 TracksPropagated=5.93884M/s
BM_PROPAGATION_TOY_DETECTOR_W_COV_TRANSPORT_262144_TRACKS   39791139 ns     39722842 ns           18 TracksPropagated=6.59933M/s
BM_PROPAGATION_TOY_DETECTOR_64_TRACKS                        1194310 ns      1192008 ns          584 TracksPropagated=53.6909k/s
BM_PROPAGATION_TOY_DETECTOR_256_TRACKS                       1202305 ns      1200128 ns          583 TracksPropagated=213.311k/s
BM_PROPAGATION_TOY_DETECTOR_1024_TRACKS                      1149717 ns      1147501 ns          610 TracksPropagated=892.374k/s
BM_PROPAGATION_TOY_DETECTOR_4096_TRACKS                      1171619 ns      1169484 ns          597 TracksPropagated=3.5024M/s
BM_PROPAGATION_TOY_DETECTOR_16384_TRACKS                     1297809 ns      1295279 ns          539 TracksPropagated=12.649M/s
BM_PROPAGATION_TOY_DETECTOR_65536_TRACKS                     2993472 ns      2988127 ns          237 TracksPropagated=21.9321M/s
BM_PROPAGATION_TOY_DETECTOR_262144_TRACKS                   10588788 ns     10570713 ns           67 TracksPropagated=24.7991M/s
BM_PROPAGATION_WIRE_CHAMBER_W_COV_TRANSPORT_64_TRACKS        2052405 ns      2048621 ns          340 TracksPropagated=31.2405k/s
BM_PROPAGATION_WIRE_CHAMBER_W_COV_TRANSPORT_256_TRACKS       2444316 ns      2439919 ns          287 TracksPropagated=104.922k/s
BM_PROPAGATION_WIRE_CHAMBER_W_COV_TRANSPORT_1024_TRACKS      2755773 ns      2750885 ns          254 TracksPropagated=372.244k/s
BM_PROPAGATION_WIRE_CHAMBER_W_COV_TRANSPORT_4096_TRACKS      2998216 ns      2992741 ns          234 TracksPropagated=1.36864M/s
BM_PROPAGATION_WIRE_CHAMBER_W_COV_TRANSPORT_16384_TRACKS     4029450 ns      4022418 ns          174 TracksPropagated=4.07317M/s
BM_PROPAGATION_WIRE_CHAMBER_W_COV_TRANSPORT_65536_TRACKS    10430930 ns     10413473 ns           68 TracksPropagated=6.29339M/s
BM_PROPAGATION_WIRE_CHAMBER_W_COV_TRANSPORT_262144_TRACKS   45734699 ns     45655685 ns           15 TracksPropagated=5.74176M/s
BM_PROPAGATION_WIRE_CHAMBER_64_TRACKS                        1098592 ns      1096597 ns          634 TracksPropagated=58.3624k/s
BM_PROPAGATION_WIRE_CHAMBER_256_TRACKS                       1241353 ns      1239026 ns          565 TracksPropagated=206.614k/s
BM_PROPAGATION_WIRE_CHAMBER_1024_TRACKS                      1414712 ns      1412121 ns          496 TracksPropagated=725.15k/s
BM_PROPAGATION_WIRE_CHAMBER_4096_TRACKS                      1501604 ns      1498823 ns          467 TracksPropagated=2.73281M/s
BM_PROPAGATION_WIRE_CHAMBER_16384_TRACKS                     1636545 ns      1633525 ns          428 TracksPropagated=10.0298M/s
BM_PROPAGATION_WIRE_CHAMBER_65536_TRACKS                     2917888 ns      2912812 ns          244 TracksPropagated=22.4992M/s
BM_PROPAGATION_WIRE_CHAMBER_262144_TRACKS                   10154730 ns     10136979 ns           70 TracksPropagated=25.8602M/s

CPU (array plugin):

2024-12-10T20:22:59+01:00
Running ./bin/detray_benchmark_cpu_propagation_array
Run on (48 X 1923.09 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x24)
  L1 Instruction 32 KiB (x24)
  L2 Unified 512 KiB (x24)
  L3 Unified 32768 KiB (x4)
Load Average: 0.07, 0.46, 2.84
--------------------------------------------------------------------------------------------------------------------
Benchmark                                                          Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------
BM_PROPAGATION_TOY_DETECTOR_W_COV_TRANSPORT_64_TRACKS          72314 ns        72102 ns         9737 TracksPropagated=887.627k/s
BM_PROPAGATION_TOY_DETECTOR_W_COV_TRANSPORT_256_TRACKS        213122 ns       212421 ns         3779 TracksPropagated=1.20515M/s
BM_PROPAGATION_TOY_DETECTOR_W_COV_TRANSPORT_1024_TRACKS       668295 ns       665859 ns         1055 TracksPropagated=1.53786M/s
BM_PROPAGATION_TOY_DETECTOR_W_COV_TRANSPORT_4096_TRACKS      2696460 ns      2686243 ns          272 TracksPropagated=1.52481M/s
BM_PROPAGATION_TOY_DETECTOR_W_COV_TRANSPORT_16384_TRACKS    10607854 ns     10568141 ns           68 TracksPropagated=1.55032M/s
BM_PROPAGATION_TOY_DETECTOR_W_COV_TRANSPORT_65536_TRACKS    40922337 ns     40770747 ns           13 TracksPropagated=1.60743M/s
BM_PROPAGATION_TOY_DETECTOR_W_COV_TRANSPORT_262144_TRACKS  163753024 ns    163147137 ns            4 TracksPropagated=1.60679M/s
BM_PROPAGATION_TOY_DETECTOR_64_TRACKS                          41115 ns        40999 ns        16897 TracksPropagated=1.56103M/s
BM_PROPAGATION_TOY_DETECTOR_256_TRACKS                        107042 ns       106714 ns         6797 TracksPropagated=2.39893M/s
BM_PROPAGATION_TOY_DETECTOR_1024_TRACKS                       378640 ns       377469 ns         1914 TracksPropagated=2.7128M/s
BM_PROPAGATION_TOY_DETECTOR_4096_TRACKS                      1420297 ns      1415247 ns          496 TracksPropagated=2.89419M/s
BM_PROPAGATION_TOY_DETECTOR_16384_TRACKS                     6342550 ns      6320105 ns          125 TracksPropagated=2.59236M/s
BM_PROPAGATION_TOY_DETECTOR_65536_TRACKS                    22383618 ns     22306686 ns           31 TracksPropagated=2.93795M/s
BM_PROPAGATION_TOY_DETECTOR_262144_TRACKS                   89681423 ns     89372023 ns            8 TracksPropagated=2.93318M/s
BM_PROPAGATION_WIRE_CHAMBER_W_COV_TRANSPORT_64_TRACKS          55462 ns        55304 ns        12673 TracksPropagated=1.15724M/s
BM_PROPAGATION_WIRE_CHAMBER_W_COV_TRANSPORT_256_TRACKS        171903 ns       171360 ns         4678 TracksPropagated=1.49393M/s
BM_PROPAGATION_WIRE_CHAMBER_W_COV_TRANSPORT_1024_TRACKS       509352 ns       507743 ns         1381 TracksPropagated=2.01677M/s
BM_PROPAGATION_WIRE_CHAMBER_W_COV_TRANSPORT_4096_TRACKS      2016280 ns      2009437 ns          349 TracksPropagated=2.03838M/s
BM_PROPAGATION_WIRE_CHAMBER_W_COV_TRANSPORT_16384_TRACKS     8369566 ns      8340759 ns           73 TracksPropagated=1.96433M/s
BM_PROPAGATION_WIRE_CHAMBER_W_COV_TRANSPORT_65536_TRACKS    32288152 ns     32173467 ns           22 TracksPropagated=2.03696M/s
BM_PROPAGATION_WIRE_CHAMBER_W_COV_TRANSPORT_262144_TRACKS  127973107 ns    127525634 ns            5 TracksPropagated=2.05562M/s
BM_PROPAGATION_WIRE_CHAMBER_64_TRACKS                          31645 ns        31558 ns        23134 TracksPropagated=2.02804M/s
BM_PROPAGATION_WIRE_CHAMBER_256_TRACKS                         73822 ns        73614 ns         9600 TracksPropagated=3.47761M/s
BM_PROPAGATION_WIRE_CHAMBER_1024_TRACKS                       252862 ns       252076 ns         2014 TracksPropagated=4.06226M/s
BM_PROPAGATION_WIRE_CHAMBER_4096_TRACKS                       988791 ns       985845 ns          711 TracksPropagated=4.15481M/s
BM_PROPAGATION_WIRE_CHAMBER_16384_TRACKS                     3931960 ns      3918705 ns          179 TracksPropagated=4.18097M/s
BM_PROPAGATION_WIRE_CHAMBER_65536_TRACKS                    16473055 ns     16413907 ns           45 TracksPropagated=3.99271M/s
BM_PROPAGATION_WIRE_CHAMBER_262144_TRACKS                   62515746 ns     62305748 ns           11 TracksPropagated=4.20738M/s

@niermann999 niermann999 force-pushed the feat-propagation-benchmarks branch 5 times, most recently from cf3e004 to 2550f43 Compare December 19, 2024 18:28
@niermann999 niermann999 force-pushed the feat-propagation-benchmarks branch from 2550f43 to ddd3719 Compare December 20, 2024 13:53
@niermann999
Copy link
Contributor Author

This finished, but needs a MR in the detray-benchmark repository first.

@niermann999 niermann999 added blocked This item is blocked by another item and removed blocked This item is blocked by another item labels Dec 20, 2024
@niermann999
Copy link
Contributor Author

This finished, but needs a MR in the detray-benchmark repository first.

The gitlab benchmark MR was merged, so this PR needs to be the next in line, otherwise the corresponding benchmark run will fail. Could someone please do a review?

Copy link
Collaborator

@beomki-yeo beomki-yeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@niermann999 niermann999 merged commit ac8e293 into acts-project:main Jan 8, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority: high high priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants