Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared memory deposition #1158

Merged
merged 19 commits into from
Oct 7, 2024

Conversation

AlexanderSinn
Copy link
Member

@AlexanderSinn AlexanderSinn commented Sep 12, 2024

To enable shared memory deposition in GPU add:

hipace.do_shared_depos = true

The parameter plasmas.sort_bin_size was replaced with, as it now also affects beams.

hipace.tile_size = 32

MR with shared memroy:

Finished Evolve after 324.2 seconds using 1 rank
Total time per particle push: 1.544 nanoseconds (1.769 plasma, 12.1 beam)
Total time per cell update: 13.83 nanoseconds


TinyProfiler total time across processes [min...avg...max]: 324.3 ... 324.3 ... 324.3

------------------------------------------------------------------------------------------------------
Name                                                   NCalls  Excl. Min  Excl. Avg  Excl. Max   Max %
------------------------------------------------------------------------------------------------------
ExplicitDeposition()                                    44800      69.86      69.86      69.86  21.54%
AdvancePlasmaParticles()                                44800      65.78      65.78      65.78  20.28%
DepositCurrent_PlasmaParticleContainer()                44804       44.4       44.4       44.4  13.69%
DepositCurrentSlice_BeamParticleContainer()             44800       36.1       36.1       36.1  11.13%
hpmg::MultiGrid::solve1()                               22400      30.59      30.59      30.59   9.43%
AdvanceBeamParticlesSlice()                             11200      17.05      17.05      17.05   5.26%
BeamParticleContainer::ReorderParticles()               11200      9.576      9.576      9.576   2.95%
BeamParticleContainer::InitBeamFixedWeightPDFSlice()    11200      9.248      9.248      9.248   2.85%
FFTPoissonSolverDirichletFast::SolvePoissonEquation()   67200      8.062      8.062      8.062   2.49%
ParticleContainer::SortParticlesForDeposition()          5600      8.012      8.012      8.012   2.47%
PermutationForDeposition()                              16800       7.55       7.55       7.55   2.33%
PlasmaParticleContainer::TagByLevel()                   22402      3.609      3.609      3.609   1.11%
Fields::SolvePoissonPsiExmByEypBxEzBz()                 11200      2.479      2.479      2.479   0.76%
Fields::LevelUpBoundary()                              123200      1.986      1.986      1.986   0.61%
shiftSlippedParticles()                                 11200       1.47       1.47       1.47   0.45%
Fields::ShiftSlices()                                   22400      1.369      1.369      1.369   0.42%
Fields::InitializeSlices()                              22400      1.257      1.257      1.257   0.39%
Hipace::InitializeSxSyWithBeam()                        22400       1.14       1.14       1.14   0.35%
Hipace::SolveOneSlice()                                 11200     0.7064     0.7064     0.7064   0.22%
Hipace::Evolve()                                            1     0.6222     0.6222     0.6222   0.19%
Hipace::ExplicitMGSolveBxBy()                           22400     0.3252     0.3252     0.3252   0.10%
MultiBuffer::get_data()                                 11200    0.04189    0.04189    0.04189   0.01%
PlasmaParticleContainer::ReorderParticles()              5600    0.00954    0.00954    0.00954   0.00%
main()                                                      1   0.001976   0.001976   0.001976   0.00%
Other                                                  178374      3.083      3.083      3.083   0.95%
------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------
Name                                                   NCalls  Incl. Min  Incl. Avg  Incl. Max   Max %
------------------------------------------------------------------------------------------------------
main()                                                      1      324.3      324.3      324.3 100.00%
Hipace::Evolve()                                            1      324.2      324.2      324.2  99.97%
Hipace::SolveOneSlice()                                 11200      323.2      323.2      323.2  99.67%
ExplicitDeposition()                                    44800      69.86      69.86      69.86  21.54%
AdvancePlasmaParticles()                                44800      65.78      65.78      65.78  20.28%
DepositCurrent_PlasmaParticleContainer()                44804       44.4       44.4       44.4  13.69%
DepositCurrentSlice_BeamParticleContainer()             44800       36.1       36.1       36.1  11.13%
Hipace::ExplicitMGSolveBxBy()                           22400      32.08      32.08      32.08   9.89%
hpmg::MultiGrid::solve1()                               22400      30.59      30.59      30.59   9.43%
AdvanceBeamParticlesSlice()                             11200      17.05      17.05      17.05   5.26%
BeamParticleContainer::ReorderParticles()               11200      14.74      14.74      14.74   4.54%
Fields::SolvePoissonPsiExmByEypBxEzBz()                 11200      12.04      12.04      12.04   3.71%
PlasmaParticleContainer::ReorderParticles()              5600      10.41      10.41      10.41   3.21%
ParticleContainer::SortParticlesForDeposition()          5600       10.4       10.4       10.4   3.21%
MultiBuffer::get_data()                                 11200      9.332      9.332      9.332   2.88%
BeamParticleContainer::InitBeamFixedWeightPDFSlice()    11200       9.29       9.29       9.29   2.86%
FFTPoissonSolverDirichletFast::SolvePoissonEquation()   67200      8.062      8.062      8.062   2.49%
PermutationForDeposition()                              16800       7.55       7.55       7.55   2.33%
PlasmaParticleContainer::TagByLevel()                   22402      3.609      3.609      3.609   1.11%
Fields::LevelUpBoundary()                              123200      1.986      1.986      1.986   0.61%
shiftSlippedParticles()                                 11200      1.788      1.788      1.788   0.55%
Fields::ShiftSlices()                                   22400      1.369      1.369      1.369   0.42%
Fields::InitializeSlices()                              22400      1.257      1.257      1.257   0.39%
Hipace::InitializeSxSyWithBeam()                        22400       1.14       1.14       1.14   0.35%
Other                                                  178374      3.198      3.198      3.198   0.99%
------------------------------------------------------------------------------------------------------

Device Memory Usage:
-----------------------------------------------------------------------------------
Name                                             Nalloc   Nfree    AvgMem    MaxMem
-----------------------------------------------------------------------------------
The_Arena::Initialize()                               1       1   835 KiB    59 GiB
ParticleContainer::SortParticlesForDeposition()   16800   16800  1558 MiB  1622 MiB
PlasmaParticleContainer::InitParticles()            126     126  3120 KiB  1576 MiB
BeamParticleContainer::resize()                  110721  110721   389 MiB  1107 MiB
Fields::AllocData()                                   2       2   337 MiB   337 MiB
BeamParticleContainer::ReorderParticles()         33600   33600    38 MiB   145 MiB
Hipace::ExplicitMGSolveBxBy()                        72      72   117 MiB   117 MiB
FFTPoissonSolverDirichletFast::define()              14      14    79 MiB    79 MiB
PermutationForDeposition()                        67200   67200  2570 KiB    66 MiB
ResizeRandomSeed                                      1       1    40 MiB    40 MiB
DepositCurrent_PlasmaParticleContainer()         179216  179216  4859 KiB    36 MiB
ExplicitDeposition()                             134400  134400  7886 KiB    36 MiB
DepositCurrentSlice_BeamParticleContainer()      134398  134398  2903 KiB    34 MiB
hpmg::MultiGrid::solve1()                         88139   88139    21 KiB   432 KiB
shiftSlippedParticles()                           53444   53444   259   B   108 KiB
Hipace::InitData()                                   13      13   495   B   496   B
main()                                               11      11   431   B   432   B
Fields::Copy()                                        1       1    15   B    16   B
-----------------------------------------------------------------------------------

Managed Memory Usage:
----------------------------------------------------------------
Name                             Nalloc  Nfree  AvgMem    MaxMem
----------------------------------------------------------------
The_Managed_Arena::Initialize()       1      1   2   B  8192 KiB
----------------------------------------------------------------

Pinned Memory Usage:
---------------------------------------------------------------------------
Name                                      Nalloc  Nfree    AvgMem    MaxMem
---------------------------------------------------------------------------
Diagnostic::ResizeFDiagFAB()                   2      2   139 MiB   139 MiB
The_Pinned_Arena::Initialize()                 1      1   146   B  8192 KiB
Hipace::InitData()                            55     55   175 KiB   175 KiB
Hipace::ExplicitMGSolveBxBy()                  2      2  2046   B  2048   B
main()                                        98     98   431   B   464   B
Fields::Copy()                                 1      1    15   B    16   B
PlasmaParticleContainer::InitParticles()       2      2     0   B    16   B
hpmg::MultiGrid::solve1()                  88139  88139     1   B    16   B
shiftSlippedParticles()                    22400  22400     0   B    16   B
---------------------------------------------------------------------------

dev:

Finished Evolve after 361 seconds using 1 rank
Total time per particle push: 1.718 nanoseconds (1.97 plasma, 13.47 beam)
Total time per cell update: 15.4 nanoseconds


TinyProfiler total time across processes [min...avg...max]: 361 ... 361 ... 361

------------------------------------------------------------------------------------------------------
Name                                                   NCalls  Excl. Min  Excl. Avg  Excl. Max   Max %
------------------------------------------------------------------------------------------------------
ExplicitDeposition()                                    44800      95.92      95.92      95.92  26.57%
DepositCurrent_PlasmaParticleContainer()                44804      72.11      72.11      72.11  19.97%
AdvancePlasmaParticles()                                44800      65.87      65.87      65.87  18.24%
hpmg::MultiGrid::solve1()                               22400      30.63      30.63      30.63   8.48%
DepositCurrentSlice_BeamParticleContainer()             44800      18.85      18.85      18.85   5.22%
AdvanceBeamParticlesSlice()                             11200      17.05      17.05      17.05   4.72%
BeamParticleContainer::ReorderParticles()               11200      9.584      9.584      9.584   2.65%
BeamParticleContainer::InitBeamFixedWeightPDFSlice()    11200      9.276      9.276      9.276   2.57%
FFTPoissonSolverDirichletFast::SolvePoissonEquation()   67200      8.115      8.115      8.115   2.25%
ParticleContainer::SortParticlesForDeposition()          5600       8.02       8.02       8.02   2.22%
PermutationForDeposition()                              16800      7.551      7.551      7.551   2.09%
PlasmaParticleContainer::TagByLevel()                   22402      3.609      3.609      3.609   1.00%
Fields::SolvePoissonPsiExmByEypBxEzBz()                 11200      2.476      2.476      2.476   0.69%
Fields::LevelUpBoundary()                              123200      1.973      1.973      1.973   0.55%
shiftSlippedParticles()                                 11200      1.464      1.464      1.464   0.41%
Fields::ShiftSlices()                                   22400      1.373      1.373      1.373   0.38%
Fields::InitializeSlices()                              22400      1.257      1.257      1.257   0.35%
Hipace::InitializeSxSyWithBeam()                        22400      1.136      1.136      1.136   0.31%
Hipace::SolveOneSlice()                                 11200     0.7106     0.7106     0.7106   0.20%
Hipace::Evolve()                                            1     0.6849     0.6849     0.6849   0.19%
Hipace::ExplicitMGSolveBxBy()                           22400     0.3109     0.3109     0.3109   0.09%
MultiBuffer::get_data()                                 11200    0.03909    0.03909    0.03909   0.01%
PlasmaParticleContainer::ReorderParticles()              5600   0.009305   0.009305   0.009305   0.00%
main()                                                      1   0.002326   0.002326   0.002326   0.00%
Other                                                  178374       3.03       3.03       3.03   0.84%
------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------
Name                                                   NCalls  Incl. Min  Incl. Avg  Incl. Max   Max %
------------------------------------------------------------------------------------------------------
main()                                                      1        361        361        361 100.00%
Hipace::Evolve()                                            1        361        361        361  99.97%
Hipace::SolveOneSlice()                                 11200      359.9      359.9      359.9  99.69%
ExplicitDeposition()                                    44800      95.92      95.92      95.92  26.57%
DepositCurrent_PlasmaParticleContainer()                44804      72.11      72.11      72.11  19.97%
AdvancePlasmaParticles()                                44800      65.87      65.87      65.87  18.24%
Hipace::ExplicitMGSolveBxBy()                           22400       32.1       32.1       32.1   8.89%
hpmg::MultiGrid::solve1()                               22400      30.63      30.63      30.63   8.48%
DepositCurrentSlice_BeamParticleContainer()             44800      18.85      18.85      18.85   5.22%
AdvanceBeamParticlesSlice()                             11200      17.05      17.05      17.05   4.72%
BeamParticleContainer::ReorderParticles()               11200      14.74      14.74      14.74   4.08%
Fields::SolvePoissonPsiExmByEypBxEzBz()                 11200      12.08      12.08      12.08   3.35%
PlasmaParticleContainer::ReorderParticles()              5600      10.42      10.42      10.42   2.89%
ParticleContainer::SortParticlesForDeposition()          5600      10.41      10.41      10.41   2.88%
MultiBuffer::get_data()                                 11200      9.356      9.356      9.356   2.59%
BeamParticleContainer::InitBeamFixedWeightPDFSlice()    11200      9.317      9.317      9.317   2.58%
FFTPoissonSolverDirichletFast::SolvePoissonEquation()   67200      8.115      8.115      8.115   2.25%
PermutationForDeposition()                              16800      7.551      7.551      7.551   2.09%
PlasmaParticleContainer::TagByLevel()                   22402      3.609      3.609      3.609   1.00%
Fields::LevelUpBoundary()                              123200      1.973      1.973      1.973   0.55%
shiftSlippedParticles()                                 11200      1.776      1.776      1.776   0.49%
Fields::ShiftSlices()                                   22400      1.373      1.373      1.373   0.38%
Fields::InitializeSlices()                              22400      1.257      1.257      1.257   0.35%
Hipace::InitializeSxSyWithBeam()                        22400      1.136      1.136      1.136   0.31%
Other                                                  178374      3.136      3.136      3.136   0.87%
------------------------------------------------------------------------------------------------------

Unused ParmParse Variables:
  [TOP]::hipace.do_shared_depos(nvals = 1)  :: [true]

Device Memory Usage:
-----------------------------------------------------------------------------------
Name                                             Nalloc   Nfree    AvgMem    MaxMem
-----------------------------------------------------------------------------------
The_Arena::Initialize()                               1       1   791 KiB    59 GiB
ParticleContainer::SortParticlesForDeposition()   16800   16800  1558 MiB  1622 MiB
PlasmaParticleContainer::InitParticles()            126     126  3194 KiB  1576 MiB
BeamParticleContainer::resize()                  110721  110721   360 MiB  1107 MiB
Fields::AllocData()                                   2       2   337 MiB   337 MiB
BeamParticleContainer::ReorderParticles()         33600   33600    35 MiB   145 MiB
Hipace::ExplicitMGSolveBxBy()                        72      72   117 MiB   117 MiB
FFTPoissonSolverDirichletFast::define()              14      14    79 MiB    79 MiB
PermutationForDeposition()                        67200   67200  2310 KiB    66 MiB
ResizeRandomSeed                                      1       1    40 MiB    40 MiB
hpmg::MultiGrid::solve1()                         88139   88139    19 KiB   432 KiB
shiftSlippedParticles()                           53444   53444   233   B   108 KiB
Hipace::InitData()                                   13      13   495   B   496   B
main()                                               11      11   431   B   432   B
DepositCurrent_PlasmaParticleContainer()          44804   44804     3   B    16   B
Fields::Copy()                                        1       1    15   B    16   B
-----------------------------------------------------------------------------------

Managed Memory Usage:
----------------------------------------------------------------
Name                             Nalloc  Nfree  AvgMem    MaxMem
----------------------------------------------------------------
The_Managed_Arena::Initialize()       1      1   1   B  8192 KiB
----------------------------------------------------------------

Pinned Memory Usage:
---------------------------------------------------------------------------
Name                                      Nalloc  Nfree    AvgMem    MaxMem
---------------------------------------------------------------------------
Diagnostic::ResizeFDiagFAB()                   2      2   139 MiB   139 MiB
The_Pinned_Arena::Initialize()                 1      1   130   B  8192 KiB
Hipace::InitData()                            55     55   175 KiB   175 KiB
Hipace::ExplicitMGSolveBxBy()                  2      2  2046   B  2048   B
main()                                        96     96   431   B   464   B
Fields::Copy()                                 1      1    15   B    16   B
PlasmaParticleContainer::InitParticles()       2      2     0   B    16   B
hpmg::MultiGrid::solve1()                  88139  88139     1   B    16   B
shiftSlippedParticles()                    22400  22400     0   B    16   B
---------------------------------------------------------------------------
  • Small enough (< few 100s of lines), otherwise it should probably be split into smaller PRs
  • Tested (describe the tests in the PR description)
  • Runs on GPU (basic: the code compiles and run well with the new module)
  • Contains an automated test (checksum and/or comparison with theory)
  • Documented: all elements (classes and their members, functions, namespaces, etc.) are documented
  • Constified (All that can be const is const)
  • Code is clean (no unwanted comments, )
  • Style and code conventions are respected at the bottom of https://github.com/Hi-PACE/hipace
  • Proper label and GitHub project, if applicable

@AlexanderSinn AlexanderSinn added component: plasma About the plasma species component: beam About the beam species GPU Related to GPU acceleration performance optimization, benchmark, profiling, etc. labels Sep 12, 2024
@AlexanderSinn AlexanderSinn changed the title [WIP] Shared memory deposition Shared memory deposition Sep 18, 2024
Copy link
Member

@MaxThevenet MaxThevenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR! See comments below.

src/particles/deposition/BeamDepositCurrent.cpp Outdated Show resolved Hide resolved
src/particles/deposition/DepositionUtil.H Show resolved Hide resolved
src/particles/plasma/MultiPlasma.cpp Show resolved Hide resolved
src/particles/deposition/ExplicitDeposition.cpp Outdated Show resolved Hide resolved
Copy link
Member

@MaxThevenet MaxThevenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for this PR!

@MaxThevenet MaxThevenet merged commit 614ee07 into Hi-PACE:development Oct 7, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: beam About the beam species component: plasma About the plasma species GPU Related to GPU acceleration performance optimization, benchmark, profiling, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants