Optimized `stolarsky_mean` #2274

MarcoArtiano · 2025-02-10T14:35:48Z

The stolarsky mean will come in handy in Trixi Atmo. Here a faster version:

julia> @inline function stolarsky_mean(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               return (gamma - 1) / gamma * (y^gamma - x^gamma) /
                      (y^(gamma - 1) - x^(gamma - 1))
           end
       end
stolarsky_mean (generic function with 1 method)

julia> @inline function stolarsky_mean_2(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               expy = exp(gamma*log(y))
               expx = exp(gamma*log(x))
               return (gamma - 1) / gamma * (expy - expx) /
                      (expy/y - expx/x)
           end
       end
stolarsky_mean_2 (generic function with 1 method)

julia> @benchmark value = stolarsky_mean($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 989 evaluations per sample.
 Range (min … max):  45.982 ns … 61.638 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     46.163 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   46.301 ns ±  0.665 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁▅██▅             ▁▁                                        ▂
  █████▇▆▅▄▁▃▃▄▄▄▁▄▇█████▇▇▆▆▄▆▅▅▄▅▄▅▅▇▇▇▆▆▄▅▅▅▄▃▅▆▄▅▃▃▄▆▆█▇▇ █
  46 ns        Histogram: log(frequency) by time      49.5 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min … max):  19.342 ns … 630.474 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.566 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   20.007 ns ±   6.343 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇█    ▂▂▄                                                    ▁
  ██▇▇▇█████▇▇▇█▇▄▅▄▁▃▁▄▃▁▄▄▄▁▄▁▃▃▁▁▃▄▃▃▄▄▄▃▄▅▅▅▅▅▅▆▆▆▆▅▅▆▆▅▅▅ █
  19.3 ns       Histogram: log(frequency) by time      30.9 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Simulation of EC Polytropic Euler with the previous stolarsky mean:

────────────────────────────────────────────────────────────────────────────────────
             Trixi.jl                      Time                    Allocations      
                                  ───────────────────────   ────────────────────────
        Tot / % measured:              1.98s /  99.3%           2.34MiB /  90.4%    

Section                   ncalls     time    %tot     avg     alloc    %tot      avg
────────────────────────────────────────────────────────────────────────────────────
rhs!                       1.42k    1.93s   98.1%  1.36ms   5.14KiB    0.2%    3.70B
  volume integral          1.42k    1.74s   88.4%  1.22ms     0.00B    0.0%    0.00B
  interface flux           1.42k    163ms    8.3%   115μs     0.00B    0.0%    0.00B
  surface integral         1.42k   15.3ms    0.8%  10.8μs     0.00B    0.0%    0.00B
  Jacobian                 1.42k   8.10ms    0.4%  5.70μs     0.00B    0.0%    0.00B
  reset ∂u/∂t              1.42k   3.72ms    0.2%  2.62μs     0.00B    0.0%    0.00B
  ~rhs!~                   1.42k   1.00ms    0.1%   705ns   5.14KiB    0.2%    3.70B
  boundary flux            1.42k   46.3μs    0.0%  32.6ns     0.00B    0.0%    0.00B
  source terms             1.42k   45.5μs    0.0%  32.0ns     0.00B    0.0%    0.00B
calculate dt                 285   26.2ms    1.3%  92.0μs     0.00B    0.0%    0.00B
analyze solution               4   7.33ms    0.4%  1.83ms    314KiB   14.5%  78.5KiB
I/O                            5   4.19ms    0.2%   838μs   1.81MiB   85.3%   370KiB
  save solution                4   3.82ms    0.2%   956μs   1.80MiB   84.9%   460KiB
  ~I/O~                        5    365μs    0.0%  73.1μs   8.83KiB    0.4%  1.77KiB
  get element variables        4    730ns    0.0%   182ns     0.00B    0.0%    0.00B
  save mesh                    4    608ns    0.0%   152ns     0.00B    0.0%    0.00B
  get node variables           4   95.0ns    0.0%  23.8ns     0.00B    0.0%    0.00B
────────────────────────────────────────────────────────────────────────────────────

Results for the optimized version

────────────────────────────────────────────────────────────────────────────────────
             Trixi.jl                      Time                    Allocations      
                                  ───────────────────────   ────────────────────────
        Tot / % measured:              1.62s /  99.1%           2.34MiB /  90.4%    

Section                   ncalls     time    %tot     avg     alloc    %tot      avg
────────────────────────────────────────────────────────────────────────────────────
rhs!                       1.42k    1.57s   97.7%  1.11ms   5.14KiB    0.2%    3.70B
  volume integral          1.42k    1.41s   87.8%   994μs     0.00B    0.0%    0.00B
  interface flux           1.42k    131ms    8.2%  92.4μs     0.00B    0.0%    0.00B
  surface integral         1.42k   14.9ms    0.9%  10.5μs     0.00B    0.0%    0.00B
  Jacobian                 1.42k   7.29ms    0.5%  5.13μs     0.00B    0.0%    0.00B
  reset ∂u/∂t              1.42k   3.83ms    0.2%  2.70μs     0.00B    0.0%    0.00B
  ~rhs!~                   1.42k    956μs    0.1%   673ns   5.14KiB    0.2%    3.70B
  boundary flux            1.42k   61.6μs    0.0%  43.3ns     0.00B    0.0%    0.00B
  source terms             1.42k   24.5μs    0.0%  17.3ns     0.00B    0.0%    0.00B
calculate dt                 285   26.0ms    1.6%  91.2μs     0.00B    0.0%    0.00B
analyze solution               4   6.36ms    0.4%  1.59ms    314KiB   14.5%  78.6KiB
I/O                            5   5.35ms    0.3%  1.07ms   1.81MiB   85.3%   370KiB
  save solution                4   5.00ms    0.3%  1.25ms   1.80MiB   84.9%   461KiB
  ~I/O~                        5    352μs    0.0%  70.5μs   8.83KiB    0.4%  1.77KiB
  save mesh                    4    755ns    0.0%   189ns     0.00B    0.0%    0.00B
  get element variables        4    477ns    0.0%   119ns     0.00B    0.0%    0.00B
  get node variables           4   86.0ns    0.0%  21.5ns     0.00B    0.0%    0.00B
────────────────────────────────────────────────────────────────────────────────────

Thus on my machine, there's an 18% improvement.

github-actions · 2025-02-10T14:36:02Z

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

The PR has a single goal that is clear from the PR title and/or description.
All code changes represent a single set of modifications that logically belong together.
No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

The code can be understood easily.
Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
There are no redundancies that can be removed by simple modularization/refactoring.
There are no leftover debug statements or commented code sections.
The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

New functions and types are documented with a docstring or top-level comment.
Relevant publications are referenced in docstrings (see example for formatting).
Inline comments are used to document longer or unusual code sections.
Comments describe intent ("why?") and not just functionality ("what?").
If the PR introduces a significant change or new feature, it is documented in NEWS.md with its PR number.

Testing

The PR passes all tests.
New or modified lines of code are covered by tests.
New or modified tests run in less then 10 seconds.

Performance

There are no type instabilities or memory allocations in performance-critical parts.
If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

The correctness of the code was verified using appropriate tests.
If new equations/methods are added, a convergence test has been run and the results
are posted in the PR.

Created with ❤️ by the Trixi.jl community.

codecov · 2025-02-10T14:59:25Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.88%. Comparing base (ce47aea) to head (e317708).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2274   +/-   ##
=======================================
  Coverage   96.88%   96.88%           
=======================================
  Files         490      490           
  Lines       39491    39496    +5     
=======================================
+ Hits        38260    38265    +5     
  Misses       1231     1231

Flag	Coverage Δ
unittests	`96.88% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

andrewwinters5000

This is an interesting (and ingenious way) to rewrite the expression and possibly improve performance. Would it be worthwhile to also benchmark it on Roci (or some other machine) just to see its imfluence?

src/auxiliary/math.jl

Co-authored-by: Andrew Winters <[email protected]>

ranocha

Thanks!

src/auxiliary/math.jl

MarcoArtiano · 2025-02-11T10:11:56Z

Hendrik made me realize that for integers the exp(log(...)) is slower. I tried the trick to avoid division, but that doesn't change anything actually. I added a specialization for integers and the results are the following:
Functions

julia> @inline function stolarsky_mean(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               return (gamma - 1) / gamma * (y^gamma - x^gamma) /
                      (y^(gamma - 1) - x^(gamma - 1))
           end
       end
stolarsky_mean (generic function with 1 method)

julia> @inline function stolarsky_mean_2(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               expx = x^(gamma-1)
               expy = y^(gamma-1)
               return (gamma - 1) / gamma * (expy*y - expx*x) /
                      (expy - expx)
           end
       end
stolarsky_mean_2 (generic function with 1 method)

julia> @inline function stolarsky_mean_3(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               expy = exp((gamma-1)*log(y))
               expx = exp((gamma-1)*log(x))
               return (gamma - 1) / gamma * (expy*y - expx*x) /
                      (expy - expx)
           end
       end
stolarsky_mean_3 (generic function with 1 method)

julia> @inline function stolarsky_mean_4(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               if isinteger(gamma)
               expy = y^(gamma-1)
               expx = x^(gamma-1)    
               else
               expy = exp((gamma-1)*log(y))
               expx = exp((gamma-1)*log(x))
               end
               return (gamma - 1) / gamma * (expy*y - expx*x) /
                      (expy - expx)
           end
       end
stolarsky_mean_4 (generic function with 1 method)

For real numbers:

julia> @benchmark value = stolarsky_mean($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 989 evaluations per sample.
 Range (min … max):  45.981 ns … 56.275 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     46.167 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   46.241 ns ±  0.440 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▁▄▆██▇▄                      ▁▁▂▁                          ▂
  ▆███████▇▅▅▄▁▁▃▁▃▁▁▃▁▃▃▁▁▁▁▄▅▇████▇▇▆▆▆▄▅▅▅▅▄▅▄▃▄▄▁▄▃▄▃▁▃▃▄ █
  46 ns        Histogram: log(frequency) by time      48.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 996 evaluations per sample.
 Range (min … max):  23.530 ns … 42.782 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     23.656 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   23.853 ns ±  0.654 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▅█▇▂                                 ▁▃▁                   ▁
  ▆████▆▅▅▅▄▄▄▅▄▄▅▄▅▄▅▄▅▄▃▅▇███▇▇▆▆▆▅▅▆▆████▇▆▅▅▄▃▅▅▅▄▄▄▄▅▃▅▅ █
  23.5 ns      Histogram: log(frequency) by time      26.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_3($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min … max):  19.433 ns … 25.629 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.512 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.601 ns ±  0.378 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃▇█▅                                     ▁▁                 ▂
  █████▄▅▅▅▅▅▅▅▅▃▄▅▄▅▄▃▄▆▄▅▅▄▄▅▅▇▇███▆▇▆▆▆▇███▇▆▅▅▄▄▃▁▄▅▄▃▆▅▄ █
  19.4 ns      Histogram: log(frequency) by time      21.5 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_4($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min … max):  19.665 ns … 34.485 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.760 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.803 ns ±  0.360 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▂█▇                                                        
  ▂▄████▃▂▂▂▁▂▁▁▁▁▁▁▂▂▁▂▂▁▁▁▁▁▁▂▁▁▁▂▁▁▁▂▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂ ▂
  19.7 ns         Histogram: frequency by time        21.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

For integers

julia> @benchmark value = stolarsky_mean($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 998 evaluations per sample.
 Range (min … max):  16.133 ns … 30.958 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     17.224 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   17.084 ns ±  0.718 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄▁  ██       ▃▄   ▄▁  ██ ▁ ▂   ▂▂   ▂   ▅▆   ▃   ▂▃         ▃
  ██▁▁███▅▄▆▃▃▁██▄▄▁██▅█████▇██▇▆██▆▄▅█▇▅▆██▇▇███▆▆██▆▅▅▇▅▅▅█ █
  16.1 ns      Histogram: log(frequency) by time        19 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):  7.857 ns … 22.603 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     9.162 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.107 ns ±  0.444 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

            ▁        ▁█▄▄  ▂   ▁█▄  ▄▃▂  ▁▆   ▂▃         ▁▁  ▂
  ▇▆▁▁▃▆▃▁▁▁█▄▄▆▅▇▄▃▅████▆▇█▆▄▅███▆▆███▆▇██▅▅▄███▇▆▆▇▆▆▅▇██▇ █
  7.86 ns      Histogram: log(frequency) by time     10.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_3($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min … max):  19.421 ns … 25.844 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.508 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.540 ns ±  0.250 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▅█▁                                                       
  ▂▃▇███▄▂▂▂▂▁▂▁▁▁▂▁▁▂▁▁▁▂▁▁▁▁▂▁▁▁▁▁▁▁▂▂▁▁▂▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂ ▂
  19.4 ns         Histogram: frequency by time        20.6 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_4($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):  7.203 ns … 30.064 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     8.531 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.757 ns ±  0.461 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

            ▂▁     ▁▁       █▇   ▂   █▆ ▁▂▁   ▅   ▂          ▂
  ▄▁▁▃▁▁▄▁▇████▄▃▁▄██▇▅▆█▅▄▅██▇▆▆█▅▆▅██▆███▆▅▅██▆▆█▇▆▅▆▆▇▆▇█ █
  7.2 ns       Histogram: log(frequency) by time       10 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

So, basically just by preallocating the power functions a 50% speed up is gained. The exp(log(...)) is a small improvement compared to that. Hendrik made me notice that actually for real numbers julia is exactly calling that exp(log(...)). For some reason I noticed that newer versions of Julia have less noticeable differences between x^a and exp(a*log(x)).

MarcoArtiano · 2025-02-11T10:42:01Z

Results on university machine (Goldstein):

new version

────────────────────────────────────────────────────────────────────────────────────────────────────
Trixi.jl simulation finished.  Final time: 2.0  Time steps: 284 (accepted), 284 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────────
             Trixi.jl                      Time                    Allocations      
                                  ───────────────────────   ────────────────────────
        Tot / % measured:              3.07s /  99.5%           2.35MiB /  90.4%    

Section                   ncalls     time    %tot     avg     alloc    %tot      avg
────────────────────────────────────────────────────────────────────────────────────
rhs!                       1.42k    2.93s   95.8%  2.06ms   5.14KiB    0.2%    3.70B
  volume integral          1.42k    2.65s   86.7%  1.86ms     0.00B    0.0%    0.00B
  interface flux           1.42k    236ms    7.7%   166μs     0.00B    0.0%    0.00B
  surface integral         1.42k   24.7ms    0.8%  17.4μs     0.00B    0.0%    0.00B
  Jacobian                 1.42k   12.2ms    0.4%  8.56μs     0.00B    0.0%    0.00B
  reset ∂u/∂t              1.42k   5.15ms    0.2%  3.63μs     0.00B    0.0%    0.00B
  ~rhs!~                   1.42k    863μs    0.0%   607ns   5.14KiB    0.2%    3.70B
  boundary flux            1.42k   33.6μs    0.0%  23.7ns     0.00B    0.0%    0.00B
  source terms             1.42k   30.3μs    0.0%  21.3ns     0.00B    0.0%    0.00B
calculate dt                 285   76.0ms    2.5%   267μs     0.00B    0.0%    0.00B
I/O                            5   40.0ms    1.3%  8.01ms   1.81MiB   85.2%   370KiB
  save solution                4   33.5ms    1.1%  8.37ms   1.80MiB   84.8%   461KiB
  ~I/O~                        5   6.57ms    0.2%  1.31ms   8.83KiB    0.4%  1.77KiB
  get element variables        4    744ns    0.0%   186ns     0.00B    0.0%    0.00B
  save mesh                    4    605ns    0.0%   151ns     0.00B    0.0%    0.00B
  get node variables           4    371ns    0.0%  92.8ns     0.00B    0.0%    0.00B
analyze solution               4   11.3ms    0.4%  2.83ms    316KiB   14.5%  79.0KiB
────────────────────────────────────────────────────────────────────────────────────

old version:

────────────────────────────────────────────────────────────────────────────────────────────────────
Trixi.jl simulation finished.  Final time: 2.0  Time steps: 284 (accepted), 284 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────────
             Trixi.jl                      Time                    Allocations      
                                  ───────────────────────   ────────────────────────
        Tot / % measured:              3.69s /  99.5%           2.34MiB /  90.4%    

Section                   ncalls     time    %tot     avg     alloc    %tot      avg
────────────────────────────────────────────────────────────────────────────────────
rhs!                       1.42k    3.54s   96.4%  2.49ms   5.14KiB    0.2%    3.70B
  volume integral          1.42k    3.20s   87.2%  2.25ms     0.00B    0.0%    0.00B
  interface flux           1.42k    298ms    8.1%   209μs     0.00B    0.0%    0.00B
  surface integral         1.42k   24.6ms    0.7%  17.3μs     0.00B    0.0%    0.00B
  Jacobian                 1.42k   12.3ms    0.3%  8.63μs     0.00B    0.0%    0.00B
  reset ∂u/∂t              1.42k   5.23ms    0.1%  3.68μs     0.00B    0.0%    0.00B
  ~rhs!~                   1.42k   1.01ms    0.0%   714ns   5.14KiB    0.2%    3.70B
  boundary flux            1.42k   35.8μs    0.0%  25.2ns     0.00B    0.0%    0.00B
  source terms             1.42k   30.8μs    0.0%  21.7ns     0.00B    0.0%    0.00B
calculate dt                 285   75.9ms    2.1%   266μs     0.00B    0.0%    0.00B
I/O                            5   42.0ms    1.1%  8.40ms   1.81MiB   85.3%   370KiB
  save solution                4   35.3ms    1.0%  8.82ms   1.80MiB   84.9%   461KiB
  ~I/O~                        5   6.69ms    0.2%  1.34ms   8.83KiB    0.4%  1.77KiB
  get element variables        4    559ns    0.0%   140ns     0.00B    0.0%    0.00B
  save mesh                    4    426ns    0.0%   106ns     0.00B    0.0%    0.00B
  get node variables           4    222ns    0.0%  55.5ns     0.00B    0.0%    0.00B
analyze solution               4   13.2ms    0.4%  3.30ms    314KiB   14.5%  78.6KiB
────────────────────────────────────────────────────────────────────────────────────

There's still a roughly 17% improvement.

Benchmarks for Goldstein:

julia> @benchmark value = stolarsky_mean($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 960 evaluations per sample.
 Range (min … max):  87.431 ns …  1.192 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     88.289 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   88.624 ns ± 13.750 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

          ▁▅▅▃▇█▇▅▆▅▅▆▅▄▁                                      
  ▁▂▃▅▆▇██████████████████▅▄▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  87.4 ns         Histogram: frequency by time        90.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 976 evaluations per sample.
 Range (min … max):  44.864 ns …  1.349 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     45.267 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   45.460 ns ± 13.072 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▁▅▄▅▅▆▇█▆▄▄▄▃▁                                            
  ▂▃▅██████████████▇▅▄▃▂▂▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▂▁▁▂▂▂▂ ▄
  44.9 ns         Histogram: frequency by time        47.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_3($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 992 evaluations per sample.
 Range (min … max):  37.400 ns …  1.062 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     37.678 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   38.034 ns ± 10.298 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂█▅                                                          
  ███▇▆▆▄▄▃▃▃▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▂▂▁▁▁▂▁▁▁▁▁▁▁▁▁▂▁▁▁▂▂▂ ▃
  37.4 ns         Histogram: frequency by time        44.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_4($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 992 evaluations per sample.
 Range (min … max):  38.200 ns … 78.343 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     38.464 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   38.558 ns ±  0.867 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▁▂▅██▆▄▃▂▁▂▁                                             
  ▁▂▄▇████████████▇█▇▆▅▅▄▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  38.2 ns         Histogram: frequency by time        39.6 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Integers:

julia> @benchmark value = stolarsky_mean($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 996 evaluations per sample.
 Range (min … max):  24.793 ns … 69.045 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     24.805 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   24.874 ns ±  0.858 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▆▇▆▂                       ▂▃▁▁                            ▂
  ██████▇▇▅▄▃▃▁▄▄▃▁▃▁▁▁▁▃▁▁▁▁▁████▇▁▁▁▁▁▁▁▁▃▁▃▁▁▅▅▁▃▁▄▃▁▄▁▇█▇ █
  24.8 ns      Histogram: log(frequency) by time      25.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):  11.524 ns … 45.145 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     11.833 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   11.931 ns ±  0.569 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █             ▂                            ▁                 
  █▇▄▂▂▂▂▂▂▁▁▁▂▁█▅▃▂▂▂▁▂▂▂▂▂▂▂██▅▂▂▂▂▂▂▁▂▂▂▂▂█▄▂▂▂▂▂▂▂▂▂▂▂▁▃▂ ▃
  11.5 ns         Histogram: frequency by time        12.7 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_3($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 992 evaluations per sample.
 Range (min … max):  37.260 ns …  1.088 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     37.533 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   37.786 ns ± 10.568 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▂██▇████▆▅▃▂                                               
  ▂▅████████████▇▇▅▆▅▅▅▆▆▅▄▅▄▄▄▄▃▃▃▂▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁ ▄
  37.3 ns         Histogram: frequency by time        38.7 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_4($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):  11.528 ns … 109.493 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     11.570 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   11.817 ns ±   1.132 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇▅▃▂▂▁▁ ▁  ▁          ▆▆▄          ▁          ▁           ▃ ▂
  ███████████████▆▅▆▆▇▆▆▆████▆▅▃▆▅▅▅▁▇█▆▃▄▄▁▃▁▁▄▁██▅▃▃▁▃▁▁▄▁▄█ █
  11.5 ns       Histogram: log(frequency) by time        13 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

ranocha · 2025-02-11T11:11:37Z

src/auxiliary/math.jl

@@ -412,8 +412,15 @@ Given ε = 1.0e-4, we use the following algorithm.
        c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
        return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
    else
-        return (gamma - 1) / gamma * (y^gamma - x^gamma) /
-               (y^(gamma - 1) - x^(gamma - 1))
+        if isinteger(gamma)


Suggested change

if isinteger(gamma)

if gamma isa Integer

isinteger checks also the values of floating-point numbers

That seems much faster, but I had to remove the specific parametrization. That would imply also to change the struct, so that, for instance, 2 can be accepted, instead of 2.0.

julia> @inline function stolarsky_mean_1(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real} epsilon_f2 = convert(RealT, 1.0e-4) f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2 if f2 < epsilon_f2 # convenience coefficients c1 = convert(RealT, 1 / 3) * (gamma - 2) c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1 c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2 return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3) else if isinteger(gamma) yg = y^(gamma-1) xg = x^(gamma-1) else yg = exp((gamma-1)*log(y)) xg = exp((gamma-1)*log(x)) end return (gamma - 1) * (yg*y - xg*x) / (gamma * ( yg - xg)) end end stolarsky_mean_1 (generic function with 1 method) julia> @inline function stolarsky_mean_2(x::Real, y::Real, gamma::Real) epsilon_f2 = convert(Real, 1.0e-4) f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2 if f2 < epsilon_f2 # convenience coefficients c1 = convert(Real, 1 / 3) * (gamma - 2) c2 = convert(Real, -1 / 15) * (gamma + 1) * (gamma - 3) * c1 c3 = convert(Real, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2 return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3) else if gamma isa Integer yg = y^(gamma-1) xg = x^(gamma-1) else yg = exp((gamma-1)*log(y)) xg = exp((gamma-1)*log(x)) end return (gamma - 1) * (yg*y - xg*x) / (gamma * (yg - xg) ) end end stolarsky_mean_2 (generic function with 1 method) julia> @benchmark value = stolarsky_mean_1($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.7))[]) BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample. Range (min … max): 19.647 ns … 27.185 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 19.982 ns ┊ GC (median): 0.00% Time (mean ± σ): 20.086 ns ± 0.438 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▃▅▆▇████▇▆▃ ▁ ▁▁▁▁ ▂ ▄▇███████████▇▆▄▄▅▆▅▅▅▆▆▅▄▄▅▆▆▇██▇█████████████▇▅▅▅▅▅▆▆▅▆▄▆ █ 19.6 ns Histogram: log(frequency) by time 22.1 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.7))[]) BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample. Range (min … max): 19.394 ns … 35.163 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 19.588 ns ┊ GC (median): 0.00% Time (mean ± σ): 19.620 ns ± 0.326 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▃▅▇██▇▄ ▂▂▃▅▇████████▅▃▂▂▂▁▁▁▁▂▂▁▁▂▁▂▁▁▂▂▁▁▂▁▁▂▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂ ▃ 19.4 ns Histogram: frequency by time 20.7 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark value = stolarsky_mean_1($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[]) BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample. Range (min … max): 8.078 ns … 20.871 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 9.662 ns ┊ GC (median): 0.00% Time (mean ± σ): 9.584 ns ± 0.374 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▁ ▂▂▅▄▆ ▄▃ ▃▅▅▅█▇ ▂ ▂▁▃▃▅ ▂ ▂▂▃▆▄▆▄▇▆▇█▇▅▅▅▆▇█▇▆█████████▇▅██▄██████▄▆█▅▇█████▅▄▇▅▃▆▄▄ █ 8.08 ns Histogram: log(frequency) by time 10.5 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1))[]) BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample. Range (min … max): 2.500 ns … 7.277 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 2.536 ns ┊ GC (median): 0.00% Time (mean ± σ): 2.543 ns ± 0.107 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▄ █ ▁▁▆ ▆ ▆ ▇ ▁ ▁▁▁▁▁▁▁▁▃▂▂▃▃▇▅▅▅█▆▇▇██████████▇▇▇██▇▇▆█▆▆▆█▄▄▃▃▄▂▂▂▁▁▁▁▁ ▄ 2.5 ns Histogram: frequency by time 2.57 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark value = stolarsky_mean_1($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(2.0))[]) BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample. Range (min … max): 9.267 ns … 17.994 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 10.538 ns ┊ GC (median): 0.00% Time (mean ± σ): 10.502 ns ± 0.383 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▃▄▃ ▁▂▅▃▄ ▃ ▅▄▆▆█▂ ▂▂▂▂▅▂ ▂ ▃▁▆▇▄▁▃▁▆▇▅▆█████████▅▆█▇███████▆█▇███████▅▇▆▆▆█▇▇█▇▅▆▇▆▅▆▆ █ 9.27 ns Histogram: log(frequency) by time 11.8 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(2))[]) BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample. Range (min … max): 5.276 ns … 11.283 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 5.684 ns ┊ GC (median): 0.00% Time (mean ± σ): 5.751 ns ± 0.224 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▇█▅ ▄▅▂ ▂ ▄▁▁▁▁▁▁▁▁▁▆█▄▁▁▁▁▁▁▁▁▁▁████▄▁▁▁▁▁▁▃▁▇▇▅▄▁▁▁▃▃▁▁▅▆███▇▄▃▃▄▃ █ 5.28 ns Histogram: log(frequency) by time 6.24 ns < Memory estimate: 0 bytes, allocs estimate: 0.

Can you please change the code

@inline stolarsky_mean(x::Real, y::Real, gamma::Real) = stolarsky_mean(promote(x, y, gamma)...)

a few lines above to

@inline stolarsky_mean(x::Real, y::Real, gamma::Real) = stolarsky_mean(promote(x, y)..., gamma)

and then check the gamma isa Integer option?

I get the following error when trying that. As far as I understood, that creates a loop because the main function is always defined for type Reals, and all the variables are bounded to be of the same type, hence the function calls itself and doesn't switch to the "main" one.

julia> function stolarsky_mean(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real} epsilon_f2 = convert(RealT, 1.0e-4) f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2 if f2 < epsilon_f2 # convenience coefficients c1 = convert(RealT, 1 / 3) * (gamma - 2) c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1 c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2 return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3) else if gamma isa Integer yg = y^(gamma - 1) xg = x^(gamma - 1) else yg = exp((gamma - 1) * log(y)) # equivalent to y^gamma but faster for non-integers xg = exp((gamma - 1) * log(x)) # equivalent to x^gamma but faster for non-integers end return (gamma - 1) / gamma * (yg * y - xg * x) / (yg - xg) end end stolarsky_mean (generic function with 2 methods) julia> stolarsky_mean(x::Real, y::Real, gamma::Real) = stolarsky_mean(promote(x, y)..., gamma) stolarsky_mean (generic function with 2 methods) julia> @benchmark value = stolarsky_mean($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(2))[]) ERROR: StackOverflowError: Stacktrace: [1] stolarsky_mean(x::Float64, y::Float64, gamma::Int64) (repeats 79984 times) @ Main .\REPL[31]:1 julia> stolarsky_mean(301.2, 421.2, 2) ERROR: StackOverflowError: Stacktrace: [1] stolarsky_mean(x::Float64, y::Float64, gamma::Int64) (repeats 79984 times) @ Main .\REPL[31]:1

A workaround would be something like:

stolarsky_mean(x::Real, y::Real, gamma::Real) = stolarsky_mean(promote(x, y)..., gamma) function stolarsky_mean(x::RealT, y::RealT, gamma::Real) where {RealT <: Real} epsilon_f2 = convert(RealT, 1.0e-4) f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2 if f2 < epsilon_f2 # convenience coefficients c1 = convert(RealT, 1 / 3) * (gamma - 2) c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1 c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2 return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3) else if gamma isa Integer yg = y^(gamma - 1) xg = x^(gamma - 1) else yg = exp((gamma - 1) * log(y)) # equivalent to y^gamma but faster for non-integers xg = exp((gamma - 1) * log(x)) # equivalent to x^gamma but faster for non-integers end return (gamma - 1) / gamma * (yg * y - xg * x) / (yg - xg) end end

but again, shouldn't I allow the equation struct to accept different types and not promoting the gamma?

What I meant was basically the second option:

julia> stolarsky_mean(x::Real, y::Real, gamma::Real) = stolarsky_mean(promote(x, y)..., gamma) stolarsky_mean (generic function with 1 method) julia> function stolarsky_mean(x::RealT, y::RealT, gamma::Real) where {RealT <: Real} epsilon_f2 = convert(RealT, 1.0e-4) f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2 if f2 < epsilon_f2 # convenience coefficients c1 = convert(RealT, 1 / 3) * (gamma - 2) c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1 c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2 return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3) else if gamma isa Integer yg = y^(gamma - 1) xg = x^(gamma - 1) else yg = exp((gamma - 1) * log(y)) # equivalent to y^gamma but faster for non-integers xg = exp((gamma - 1) * log(x)) # equivalent to x^gamma but faster for non-integers end return (gamma - 1) * (yg * y - xg * x) / (gamma * (yg - xg)) end end stolarsky_mean (generic function with 2 methods) julia> stolarsky_mean(300.1, 410.7, 2) 355.40000000000003 julia> stolarsky_mean(300.1, 410.7, 2.0) 355.39999999999986 julia> stolarsky_mean(300.1, 410.7f0, 2.0) 355.4000061035154 julia> stolarsky_mean(300.1, 410.7f0, 2) 355.40000610351564

Using this approach, gamma can be an Integer and will not be promoted.

Perfect, thank you! Sorry for the missunderstanding, I will introduce that.

src/auxiliary/math.jl

ranocha · 2025-02-11T11:12:27Z

src/auxiliary/math.jl

+        return (gamma - 1) / gamma * (yg * y - xg * x) /
+               (yg - xg)


Suggested change

return (gamma - 1) / gamma * (yg * y - xg * x) /

(yg - xg)

return (gamma - 1) / gamma * (yg * y - xg * x) / (yg - xg)

Does this formatting work now?

And does it make a difference if you avoid an additional division by rewriting this part as something like

Suggested change

return (gamma - 1) / gamma * (yg * y - xg * x) /

(yg - xg)

return (gamma - 1) * (yg * y - xg * x) / (gamma * (yg - xg))

?

It looks like there's not a big difference. It is slightly faster for reals, but slightly slower for integers.

julia> @inline function stolarsky_mean_1(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real} epsilon_f2 = convert(RealT, 1.0e-4) f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2 if f2 < epsilon_f2 # convenience coefficients c1 = convert(RealT, 1 / 3) * (gamma - 2) c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1 c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2 return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3) else if isinteger(gamma) yg = y^(gamma-1) xg = x^(gamma-1) else yg = exp((gamma-1)*log(y)) xg = exp((gamma-1)*log(x)) end return (gamma - 1)/gamma * (yg*y - xg*x) / (yg - xg) end end stolarsky_mean_1 (generic function with 1 method) julia> @inline function stolarsky_mean_2(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real} epsilon_f2 = convert(RealT, 1.0e-4) f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2 if f2 < epsilon_f2 # convenience coefficients c1 = convert(RealT, 1 / 3) * (gamma - 2) c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1 c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2 return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3) else if isinteger(gamma) yg = y^(gamma-1) xg = x^(gamma-1) else yg = exp((gamma-1)*log(y)) xg = exp((gamma-1)*log(x)) end return (gamma - 1) * (yg*y - xg*x) / (gamma * (yg - xg) ) end end stolarsky_mean_2 (generic function with 1 method) julia> @benchmark value = stolarsky_mean_1($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(2.0))[]) BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample. Range (min … max): 8.965 ns … 101.965 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 9.815 ns ┊ GC (median): 0.00% Time (mean ± σ): 9.794 ns ± 1.531 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▁▂ ▃█▄ ▆▇ ▃ ▆▁ ▂▁ ▁ ▃▂▃▃██▆▅▅███▇▇▇▆▄▄▅▅██▇▄▆▇██▆▅▇██▄▄▅▆██▅▆██▆▅▄▅▆▇▇▅▅▇▄▄▃▅▄▆ █ 8.96 ns Histogram: log(frequency) by time 11.4 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(2.0))[]) BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample. Range (min … max): 9.183 ns … 19.411 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 10.492 ns ┊ GC (median): 0.00% Time (mean ± σ): 10.481 ns ± 0.363 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▁▂▁▁▁▅▄▄ ▄▂ ▁▁█▆▇▁ ▂▁▁ ▁▂ ▁ ▂ ▄▁▁▁▃▅▄▄▃██▆▅▆█████████▇██▇██████▇██▇▇▆███▆▇█████▇▆▆▆▇▇▇▇▇▇ █ 9.18 ns Histogram: log(frequency) by time 11.8 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark value = stolarsky_mean_1($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[]) BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample. Range (min … max): 19.662 ns … 45.032 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 19.761 ns ┊ GC (median): 0.00% Time (mean ± σ): 19.815 ns ± 0.562 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▃█▂ ▂▄███▄▂▂▂▁▂▂▂▁▁▁▂▁▁▂▂▂▂▁▂▁▁▁▂▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂ ▂ 19.7 ns Histogram: frequency by time 21.4 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[]) BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample. Range (min … max): 19.621 ns … 27.873 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 19.987 ns ┊ GC (median): 0.00% Time (mean ± σ): 20.087 ns ± 0.432 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▁▄▆▇█████▇▅▂ ▁ ▁▁▂▁▁ ▃ ▃▆████████████▇▆▃▆▆▅▅▄▅▆▆▄▆▆▅▆▇▇█▇████████████▇▆▆▅▃▆▅▄▆▄▅▄▆ █ 19.6 ns Histogram: log(frequency) by time 22.1 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark value = stolarsky_mean_1($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.7))[]) BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample. Range (min … max): 19.658 ns … 27.584 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 19.765 ns ┊ GC (median): 0.00% Time (mean ± σ): 19.884 ns ± 0.437 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▁▆█▇▄ ▁▁ ▁ ██████▃▅▅▃▄▅▅▄▅▄▄▅▃▅▄▄▅▅▄▅▅▅▆▅▇██▆▇▆▇▆▇███▇█▆▅▅▂▃▅▅▄▅▅▆▃▄▃▄ █ 19.7 ns Histogram: log(frequency) by time 21.8 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.7))[]) BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample. Range (min … max): 19.642 ns … 42.111 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 19.983 ns ┊ GC (median): 0.00% Time (mean ± σ): 20.010 ns ± 0.352 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▁▂▅▅▇█▇▇▅▂ ▂▂▂▃▃▅▅▇███████████▅▃▂▂▂▂▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃ 19.6 ns Histogram: frequency by time 21.2 ns < Memory estimate: 0 bytes, allocs estimate: 0.

I am mainly looking at the median results. There, the first option seems to be best all the time?

I agree with you, I will compare also full runs, to see if there are any noticeable differences.

Co-authored-by: Hendrik Ranocha <[email protected]>

MarcoArtiano added 2 commits February 10, 2025 15:06

first commit

3a2037f

format

5c2b302

MarcoArtiano added the performance We are greedy label Feb 10, 2025

MarcoArtiano requested a review from ranocha February 10, 2025 14:52

andrewwinters5000 requested changes Feb 11, 2025

View reviewed changes

src/auxiliary/math.jl Outdated Show resolved Hide resolved

Update src/auxiliary/math.jl

d8c1901

Co-authored-by: Andrew Winters <[email protected]>

ranocha requested changes Feb 11, 2025

View reviewed changes

src/auxiliary/math.jl Outdated Show resolved Hide resolved

MarcoArtiano and others added 3 commits February 11, 2025 11:24

fix for integers

8040dbd

format

744a9dd

Merge branch 'main' into opt_stolarsky

008bc8e

ranocha reviewed Feb 11, 2025

View reviewed changes

src/auxiliary/math.jl Outdated Show resolved Hide resolved

ranocha reviewed Feb 11, 2025

View reviewed changes

Update src/auxiliary/math.jl

e317708

Co-authored-by: Hendrik Ranocha <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized `stolarsky_mean` #2274

Optimized `stolarsky_mean` #2274

MarcoArtiano commented Feb 10, 2025 •

edited

Loading

github-actions bot commented Feb 10, 2025

codecov bot commented Feb 10, 2025 •

edited

Loading

andrewwinters5000 left a comment

ranocha left a comment

MarcoArtiano commented Feb 11, 2025 •

edited

Loading

MarcoArtiano commented Feb 11, 2025 •

edited

Loading

ranocha Feb 11, 2025

MarcoArtiano Feb 11, 2025 •

edited

Loading

ranocha Feb 11, 2025

MarcoArtiano Feb 11, 2025 •

edited

Loading

ranocha Feb 12, 2025

MarcoArtiano Feb 12, 2025

ranocha Feb 11, 2025

ranocha Feb 11, 2025

MarcoArtiano Feb 11, 2025 •

edited

Loading

ranocha Feb 11, 2025

MarcoArtiano Feb 12, 2025

	return (gamma - 1) / gamma * (yg * y - xg * x) /
	(yg - xg)
	return (gamma - 1) / gamma * (yg * y - xg * x) / (yg - xg)

	return (gamma - 1) / gamma * (yg * y - xg * x) /
	(yg - xg)
	return (gamma - 1) * (yg * y - xg * x) / (gamma * (yg - xg))

Optimized stolarsky_mean #2274

Are you sure you want to change the base?

Optimized stolarsky_mean #2274

Conversation

MarcoArtiano commented Feb 10, 2025 • edited Loading

github-actions bot commented Feb 10, 2025

Review checklist

Purpose and scope

Code quality

Documentation

Testing

Performance

Verification

codecov bot commented Feb 10, 2025 • edited Loading

Codecov Report

andrewwinters5000 left a comment

Choose a reason for hiding this comment

ranocha left a comment

Choose a reason for hiding this comment

MarcoArtiano commented Feb 11, 2025 • edited Loading

MarcoArtiano commented Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

MarcoArtiano Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcoArtiano Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcoArtiano Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Optimized `stolarsky_mean` #2274

Optimized `stolarsky_mean` #2274

MarcoArtiano commented Feb 10, 2025 •

edited

Loading

codecov bot commented Feb 10, 2025 •

edited

Loading

MarcoArtiano commented Feb 11, 2025 •

edited

Loading

MarcoArtiano commented Feb 11, 2025 •

edited

Loading

MarcoArtiano Feb 11, 2025 •

edited

Loading

MarcoArtiano Feb 11, 2025 •

edited

Loading

MarcoArtiano Feb 11, 2025 •

edited

Loading