Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized stolarsky_mean #2274

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

MarcoArtiano
Copy link
Contributor

@MarcoArtiano MarcoArtiano commented Feb 10, 2025

The stolarsky mean will come in handy in Trixi Atmo. Here a faster version:

julia> @inline function stolarsky_mean(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               return (gamma - 1) / gamma * (y^gamma - x^gamma) /
                      (y^(gamma - 1) - x^(gamma - 1))
           end
       end
stolarsky_mean (generic function with 1 method)

julia> @inline function stolarsky_mean_2(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               expy = exp(gamma*log(y))
               expx = exp(gamma*log(x))
               return (gamma - 1) / gamma * (expy - expx) /
                      (expy/y - expx/x)
           end
       end
stolarsky_mean_2 (generic function with 1 method)

julia> @benchmark value = stolarsky_mean($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 989 evaluations per sample.
 Range (min … max):  45.982 ns … 61.638 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     46.163 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   46.301 ns ±  0.665 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁▅██▅             ▁▁                                        ▂
  █████▇▆▅▄▁▃▃▄▄▄▁▄▇█████▇▇▆▆▄▆▅▅▄▅▄▅▅▇▇▇▆▆▄▅▅▅▄▃▅▆▄▅▃▃▄▆▆█▇▇ █
  46 ns        Histogram: log(frequency) by time      49.5 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min … max):  19.342 ns … 630.474 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.566 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   20.007 ns ±   6.343 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇█    ▂▂▄                                                    ▁
  ██▇▇▇█████▇▇▇█▇▄▅▄▁▃▁▄▃▁▄▄▄▁▄▁▃▃▁▁▃▄▃▃▄▄▄▃▄▅▅▅▅▅▅▆▆▆▆▅▅▆▆▅▅▅ █
  19.3 ns       Histogram: log(frequency) by time      30.9 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Simulation of EC Polytropic Euler with the previous stolarsky mean:

────────────────────────────────────────────────────────────────────────────────────
             Trixi.jl                      Time                    Allocations      
                                  ───────────────────────   ────────────────────────
        Tot / % measured:              1.98s /  99.3%           2.34MiB /  90.4%    

Section                   ncalls     time    %tot     avg     alloc    %tot      avg
────────────────────────────────────────────────────────────────────────────────────
rhs!                       1.42k    1.93s   98.1%  1.36ms   5.14KiB    0.2%    3.70B
  volume integral          1.42k    1.74s   88.4%  1.22ms     0.00B    0.0%    0.00B
  interface flux           1.42k    163ms    8.3%   115μs     0.00B    0.0%    0.00B
  surface integral         1.42k   15.3ms    0.8%  10.8μs     0.00B    0.0%    0.00B
  Jacobian                 1.42k   8.10ms    0.4%  5.70μs     0.00B    0.0%    0.00B
  reset ∂u/∂t              1.42k   3.72ms    0.2%  2.62μs     0.00B    0.0%    0.00B
  ~rhs!~                   1.42k   1.00ms    0.1%   705ns   5.14KiB    0.2%    3.70B
  boundary flux            1.42k   46.3μs    0.0%  32.6ns     0.00B    0.0%    0.00B
  source terms             1.42k   45.5μs    0.0%  32.0ns     0.00B    0.0%    0.00B
calculate dt                 285   26.2ms    1.3%  92.0μs     0.00B    0.0%    0.00B
analyze solution               4   7.33ms    0.4%  1.83ms    314KiB   14.5%  78.5KiB
I/O                            5   4.19ms    0.2%   838μs   1.81MiB   85.3%   370KiB
  save solution                4   3.82ms    0.2%   956μs   1.80MiB   84.9%   460KiB
  ~I/O~                        5    365μs    0.0%  73.1μs   8.83KiB    0.4%  1.77KiB
  get element variables        4    730ns    0.0%   182ns     0.00B    0.0%    0.00B
  save mesh                    4    608ns    0.0%   152ns     0.00B    0.0%    0.00B
  get node variables           4   95.0ns    0.0%  23.8ns     0.00B    0.0%    0.00B
────────────────────────────────────────────────────────────────────────────────────

Results for the optimized version

────────────────────────────────────────────────────────────────────────────────────
             Trixi.jl                      Time                    Allocations      
                                  ───────────────────────   ────────────────────────
        Tot / % measured:              1.62s /  99.1%           2.34MiB /  90.4%    

Section                   ncalls     time    %tot     avg     alloc    %tot      avg
────────────────────────────────────────────────────────────────────────────────────
rhs!                       1.42k    1.57s   97.7%  1.11ms   5.14KiB    0.2%    3.70B
  volume integral          1.42k    1.41s   87.8%   994μs     0.00B    0.0%    0.00B
  interface flux           1.42k    131ms    8.2%  92.4μs     0.00B    0.0%    0.00B
  surface integral         1.42k   14.9ms    0.9%  10.5μs     0.00B    0.0%    0.00B
  Jacobian                 1.42k   7.29ms    0.5%  5.13μs     0.00B    0.0%    0.00B
  reset ∂u/∂t              1.42k   3.83ms    0.2%  2.70μs     0.00B    0.0%    0.00B
  ~rhs!~                   1.42k    956μs    0.1%   673ns   5.14KiB    0.2%    3.70B
  boundary flux            1.42k   61.6μs    0.0%  43.3ns     0.00B    0.0%    0.00B
  source terms             1.42k   24.5μs    0.0%  17.3ns     0.00B    0.0%    0.00B
calculate dt                 285   26.0ms    1.6%  91.2μs     0.00B    0.0%    0.00B
analyze solution               4   6.36ms    0.4%  1.59ms    314KiB   14.5%  78.6KiB
I/O                            5   5.35ms    0.3%  1.07ms   1.81MiB   85.3%   370KiB
  save solution                4   5.00ms    0.3%  1.25ms   1.80MiB   84.9%   461KiB
  ~I/O~                        5    352μs    0.0%  70.5μs   8.83KiB    0.4%  1.77KiB
  save mesh                    4    755ns    0.0%   189ns     0.00B    0.0%    0.00B
  get element variables        4    477ns    0.0%   119ns     0.00B    0.0%    0.00B
  get node variables           4   86.0ns    0.0%  21.5ns     0.00B    0.0%    0.00B
────────────────────────────────────────────────────────────────────────────────────

Thus on my machine, there's an 18% improvement.

Copy link
Contributor

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md with its PR number.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

@MarcoArtiano MarcoArtiano added the performance We are greedy label Feb 10, 2025
@MarcoArtiano MarcoArtiano requested a review from ranocha February 10, 2025 14:52
Copy link

codecov bot commented Feb 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.88%. Comparing base (ce47aea) to head (e317708).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2274   +/-   ##
=======================================
  Coverage   96.88%   96.88%           
=======================================
  Files         490      490           
  Lines       39491    39496    +5     
=======================================
+ Hits        38260    38265    +5     
  Misses       1231     1231           
Flag Coverage Δ
unittests 96.88% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@andrewwinters5000 andrewwinters5000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting (and ingenious way) to rewrite the expression and possibly improve performance. Would it be worthwhile to also benchmark it on Roci (or some other machine) just to see its imfluence?

src/auxiliary/math.jl Outdated Show resolved Hide resolved
Co-authored-by: Andrew Winters <[email protected]>
Copy link
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

src/auxiliary/math.jl Outdated Show resolved Hide resolved
@MarcoArtiano
Copy link
Contributor Author

MarcoArtiano commented Feb 11, 2025

Hendrik made me realize that for integers the exp(log(...)) is slower. I tried the trick to avoid division, but that doesn't change anything actually. I added a specialization for integers and the results are the following:
Functions

julia> @inline function stolarsky_mean(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               return (gamma - 1) / gamma * (y^gamma - x^gamma) /
                      (y^(gamma - 1) - x^(gamma - 1))
           end
       end
stolarsky_mean (generic function with 1 method)

julia> @inline function stolarsky_mean_2(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               expx = x^(gamma-1)
               expy = y^(gamma-1)
               return (gamma - 1) / gamma * (expy*y - expx*x) /
                      (expy - expx)
           end
       end
stolarsky_mean_2 (generic function with 1 method)

julia> @inline function stolarsky_mean_3(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               expy = exp((gamma-1)*log(y))
               expx = exp((gamma-1)*log(x))
               return (gamma - 1) / gamma * (expy*y - expx*x) /
                      (expy - expx)
           end
       end
stolarsky_mean_3 (generic function with 1 method)

julia> @inline function stolarsky_mean_4(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               if isinteger(gamma)
               expy = y^(gamma-1)
               expx = x^(gamma-1)    
               else
               expy = exp((gamma-1)*log(y))
               expx = exp((gamma-1)*log(x))
               end
               return (gamma - 1) / gamma * (expy*y - expx*x) /
                      (expy - expx)
           end
       end
stolarsky_mean_4 (generic function with 1 method)

For real numbers:

julia> @benchmark value = stolarsky_mean($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 989 evaluations per sample.
 Range (min … max):  45.981 ns … 56.275 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     46.167 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   46.241 ns ±  0.440 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▁▄▆██▇▄                      ▁▁▂▁                          ▂
  ▆███████▇▅▅▄▁▁▃▁▃▁▁▃▁▃▃▁▁▁▁▄▅▇████▇▇▆▆▆▄▅▅▅▅▄▅▄▃▄▄▁▄▃▄▃▁▃▃▄ █
  46 ns        Histogram: log(frequency) by time      48.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 996 evaluations per sample.
 Range (min … max):  23.530 ns … 42.782 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     23.656 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   23.853 ns ±  0.654 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▅█▇▂                                 ▁▃▁                   ▁
  ▆████▆▅▅▅▄▄▄▅▄▄▅▄▅▄▅▄▅▄▃▅▇███▇▇▆▆▆▅▅▆▆████▇▆▅▅▄▃▅▅▅▄▄▄▄▅▃▅▅ █
  23.5 ns      Histogram: log(frequency) by time      26.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_3($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min … max):  19.433 ns … 25.629 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.512 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.601 ns ±  0.378 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃▇█▅                                     ▁▁                 ▂
  █████▄▅▅▅▅▅▅▅▅▃▄▅▄▅▄▃▄▆▄▅▅▄▄▅▅▇▇███▆▇▆▆▆▇███▇▆▅▅▄▄▃▁▄▅▄▃▆▅▄ █
  19.4 ns      Histogram: log(frequency) by time      21.5 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_4($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min … max):  19.665 ns … 34.485 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.760 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.803 ns ±  0.360 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▂█▇                                                        
  ▂▄████▃▂▂▂▁▂▁▁▁▁▁▁▂▂▁▂▂▁▁▁▁▁▁▂▁▁▁▂▁▁▁▂▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂ ▂
  19.7 ns         Histogram: frequency by time        21.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

For integers

julia> @benchmark value = stolarsky_mean($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 998 evaluations per sample.
 Range (min … max):  16.133 ns … 30.958 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     17.224 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   17.084 ns ±  0.718 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄▁  ██       ▃▄   ▄▁  ██ ▁ ▂   ▂▂   ▂   ▅▆   ▃   ▂▃         ▃
  ██▁▁███▅▄▆▃▃▁██▄▄▁██▅█████▇██▇▆██▆▄▅█▇▅▆██▇▇███▆▆██▆▅▅▇▅▅▅█ █
  16.1 ns      Histogram: log(frequency) by time        19 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):  7.857 ns … 22.603 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     9.162 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.107 ns ±  0.444 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

            ▁        ▁█▄▄  ▂   ▁█▄  ▄▃▂  ▁▆   ▂▃         ▁▁  ▂
  ▇▆▁▁▃▆▃▁▁▁█▄▄▆▅▇▄▃▅████▆▇█▆▄▅███▆▆███▆▇██▅▅▄███▇▆▆▇▆▆▅▇██▇ █
  7.86 ns      Histogram: log(frequency) by time     10.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_3($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min … max):  19.421 ns … 25.844 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.508 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.540 ns ±  0.250 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▅█▁                                                       
  ▂▃▇███▄▂▂▂▂▁▂▁▁▁▂▁▁▂▁▁▁▂▁▁▁▁▂▁▁▁▁▁▁▁▂▂▁▁▂▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂ ▂
  19.4 ns         Histogram: frequency by time        20.6 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_4($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):  7.203 ns … 30.064 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     8.531 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.757 ns ±  0.461 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

            ▂▁     ▁▁       █▇   ▂   █▆ ▁▂▁   ▅   ▂          ▂
  ▄▁▁▃▁▁▄▁▇████▄▃▁▄██▇▅▆█▅▄▅██▇▆▆█▅▆▅██▆███▆▅▅██▆▆█▇▆▅▆▆▇▆▇█ █
  7.2 ns       Histogram: log(frequency) by time       10 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

So, basically just by preallocating the power functions a 50% speed up is gained. The exp(log(...)) is a small improvement compared to that. Hendrik made me notice that actually for real numbers julia is exactly calling that exp(log(...)). For some reason I noticed that newer versions of Julia have less noticeable differences between x^a and exp(a*log(x)).

@MarcoArtiano
Copy link
Contributor Author

MarcoArtiano commented Feb 11, 2025

Results on university machine (Goldstein):

new version

────────────────────────────────────────────────────────────────────────────────────────────────────
Trixi.jl simulation finished.  Final time: 2.0  Time steps: 284 (accepted), 284 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────────
             Trixi.jl                      Time                    Allocations      
                                  ───────────────────────   ────────────────────────
        Tot / % measured:              3.07s /  99.5%           2.35MiB /  90.4%    

Section                   ncalls     time    %tot     avg     alloc    %tot      avg
────────────────────────────────────────────────────────────────────────────────────
rhs!                       1.42k    2.93s   95.8%  2.06ms   5.14KiB    0.2%    3.70B
  volume integral          1.42k    2.65s   86.7%  1.86ms     0.00B    0.0%    0.00B
  interface flux           1.42k    236ms    7.7%   166μs     0.00B    0.0%    0.00B
  surface integral         1.42k   24.7ms    0.8%  17.4μs     0.00B    0.0%    0.00B
  Jacobian                 1.42k   12.2ms    0.4%  8.56μs     0.00B    0.0%    0.00B
  reset ∂u/∂t              1.42k   5.15ms    0.2%  3.63μs     0.00B    0.0%    0.00B
  ~rhs!~                   1.42k    863μs    0.0%   607ns   5.14KiB    0.2%    3.70B
  boundary flux            1.42k   33.6μs    0.0%  23.7ns     0.00B    0.0%    0.00B
  source terms             1.42k   30.3μs    0.0%  21.3ns     0.00B    0.0%    0.00B
calculate dt                 285   76.0ms    2.5%   267μs     0.00B    0.0%    0.00B
I/O                            5   40.0ms    1.3%  8.01ms   1.81MiB   85.2%   370KiB
  save solution                4   33.5ms    1.1%  8.37ms   1.80MiB   84.8%   461KiB
  ~I/O~                        5   6.57ms    0.2%  1.31ms   8.83KiB    0.4%  1.77KiB
  get element variables        4    744ns    0.0%   186ns     0.00B    0.0%    0.00B
  save mesh                    4    605ns    0.0%   151ns     0.00B    0.0%    0.00B
  get node variables           4    371ns    0.0%  92.8ns     0.00B    0.0%    0.00B
analyze solution               4   11.3ms    0.4%  2.83ms    316KiB   14.5%  79.0KiB
────────────────────────────────────────────────────────────────────────────────────

old version:

────────────────────────────────────────────────────────────────────────────────────────────────────
Trixi.jl simulation finished.  Final time: 2.0  Time steps: 284 (accepted), 284 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────────
             Trixi.jl                      Time                    Allocations      
                                  ───────────────────────   ────────────────────────
        Tot / % measured:              3.69s /  99.5%           2.34MiB /  90.4%    

Section                   ncalls     time    %tot     avg     alloc    %tot      avg
────────────────────────────────────────────────────────────────────────────────────
rhs!                       1.42k    3.54s   96.4%  2.49ms   5.14KiB    0.2%    3.70B
  volume integral          1.42k    3.20s   87.2%  2.25ms     0.00B    0.0%    0.00B
  interface flux           1.42k    298ms    8.1%   209μs     0.00B    0.0%    0.00B
  surface integral         1.42k   24.6ms    0.7%  17.3μs     0.00B    0.0%    0.00B
  Jacobian                 1.42k   12.3ms    0.3%  8.63μs     0.00B    0.0%    0.00B
  reset ∂u/∂t              1.42k   5.23ms    0.1%  3.68μs     0.00B    0.0%    0.00B
  ~rhs!~                   1.42k   1.01ms    0.0%   714ns   5.14KiB    0.2%    3.70B
  boundary flux            1.42k   35.8μs    0.0%  25.2ns     0.00B    0.0%    0.00B
  source terms             1.42k   30.8μs    0.0%  21.7ns     0.00B    0.0%    0.00B
calculate dt                 285   75.9ms    2.1%   266μs     0.00B    0.0%    0.00B
I/O                            5   42.0ms    1.1%  8.40ms   1.81MiB   85.3%   370KiB
  save solution                4   35.3ms    1.0%  8.82ms   1.80MiB   84.9%   461KiB
  ~I/O~                        5   6.69ms    0.2%  1.34ms   8.83KiB    0.4%  1.77KiB
  get element variables        4    559ns    0.0%   140ns     0.00B    0.0%    0.00B
  save mesh                    4    426ns    0.0%   106ns     0.00B    0.0%    0.00B
  get node variables           4    222ns    0.0%  55.5ns     0.00B    0.0%    0.00B
analyze solution               4   13.2ms    0.4%  3.30ms    314KiB   14.5%  78.6KiB
────────────────────────────────────────────────────────────────────────────────────

There's still a roughly 17% improvement.

Benchmarks for Goldstein:

julia> @benchmark value = stolarsky_mean($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 960 evaluations per sample.
 Range (min … max):  87.431 ns …  1.192 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     88.289 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   88.624 ns ± 13.750 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

          ▁▅▅▃▇█▇▅▆▅▅▆▅▄▁                                      
  ▁▂▃▅▆▇██████████████████▅▄▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  87.4 ns         Histogram: frequency by time        90.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 976 evaluations per sample.
 Range (min … max):  44.864 ns …  1.349 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     45.267 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   45.460 ns ± 13.072 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▁▅▄▅▅▆▇█▆▄▄▄▃▁                                            
  ▂▃▅██████████████▇▅▄▃▂▂▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▂▁▁▂▂▂▂ ▄
  44.9 ns         Histogram: frequency by time        47.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_3($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 992 evaluations per sample.
 Range (min … max):  37.400 ns …  1.062 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     37.678 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   38.034 ns ± 10.298 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂█▅                                                          
  ███▇▆▆▄▄▃▃▃▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▂▂▁▁▁▂▁▁▁▁▁▁▁▁▁▂▁▁▁▂▂▂ ▃
  37.4 ns         Histogram: frequency by time        44.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_4($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 992 evaluations per sample.
 Range (min … max):  38.200 ns … 78.343 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     38.464 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   38.558 ns ±  0.867 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▁▂▅██▆▄▃▂▁▂▁                                             
  ▁▂▄▇████████████▇█▇▆▅▅▄▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  38.2 ns         Histogram: frequency by time        39.6 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Integers:

julia> @benchmark value = stolarsky_mean($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 996 evaluations per sample.
 Range (min … max):  24.793 ns … 69.045 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     24.805 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   24.874 ns ±  0.858 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▆▇▆▂                       ▂▃▁▁                            ▂
  ██████▇▇▅▄▃▃▁▄▄▃▁▃▁▁▁▁▃▁▁▁▁▁████▇▁▁▁▁▁▁▁▁▃▁▃▁▁▅▅▁▃▁▄▃▁▄▁▇█▇ █
  24.8 ns      Histogram: log(frequency) by time      25.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):  11.524 ns … 45.145 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     11.833 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   11.931 ns ±  0.569 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █             ▂                            ▁                 
  █▇▄▂▂▂▂▂▂▁▁▁▂▁█▅▃▂▂▂▁▂▂▂▂▂▂▂██▅▂▂▂▂▂▂▁▂▂▂▂▂█▄▂▂▂▂▂▂▂▂▂▂▂▁▃▂ ▃
  11.5 ns         Histogram: frequency by time        12.7 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_3($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 992 evaluations per sample.
 Range (min … max):  37.260 ns …  1.088 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     37.533 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   37.786 ns ± 10.568 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▂██▇████▆▅▃▂                                               
  ▂▅████████████▇▇▅▆▅▅▅▆▆▅▄▅▄▄▄▄▃▃▃▂▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁ ▄
  37.3 ns         Histogram: frequency by time        38.7 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_4($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):  11.528 ns … 109.493 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     11.570 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   11.817 ns ±   1.132 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇▅▃▂▂▁▁ ▁  ▁          ▆▆▄          ▁          ▁           ▃ ▂
  ███████████████▆▅▆▆▇▆▆▆████▆▅▃▆▅▅▅▁▇█▆▃▄▄▁▃▁▁▄▁██▅▃▃▁▃▁▁▄▁▄█ █
  11.5 ns       Histogram: log(frequency) by time        13 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

@@ -412,8 +412,15 @@ Given ε = 1.0e-4, we use the following algorithm.
c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
else
return (gamma - 1) / gamma * (y^gamma - x^gamma) /
(y^(gamma - 1) - x^(gamma - 1))
if isinteger(gamma)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if isinteger(gamma)
if gamma isa Integer

isinteger checks also the values of floating-point numbers

Copy link
Contributor Author

@MarcoArtiano MarcoArtiano Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems much faster, but I had to remove the specific parametrization. That would imply also to change the struct, so that, for instance, 2 can be accepted, instead of 2.0.

julia> @inline function stolarsky_mean_1(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               if isinteger(gamma)
               yg = y^(gamma-1)
               xg = x^(gamma-1)
               else
               yg = exp((gamma-1)*log(y))
               xg = exp((gamma-1)*log(x))
               end
               return (gamma - 1) * (yg*y - xg*x) /  (gamma * ( yg - xg)) 
           end
       end
stolarsky_mean_1 (generic function with 1 method)

julia> @inline function stolarsky_mean_2(x::Real, y::Real, gamma::Real)
           epsilon_f2 = convert(Real, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(Real, 1 / 3) * (gamma - 2)
               c2 = convert(Real, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(Real, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               if gamma isa Integer
                   yg = y^(gamma-1)
                   xg = x^(gamma-1)
                   else
                   yg = exp((gamma-1)*log(y))
                   xg = exp((gamma-1)*log(x))
                   end
               return (gamma - 1) * (yg*y - xg*x) / (gamma *  (yg - xg) ) 
           end
       end
stolarsky_mean_2 (generic function with 1 method)

julia> @benchmark value = stolarsky_mean_1($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.7))[])
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min … max):  19.647 ns … 27.185 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.982 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   20.086 ns ±  0.438 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▃▅▆▇████▇▆▃                      ▁    ▁▁▁▁                ▂
  ▄▇███████████▇▆▄▄▅▆▅▅▅▆▆▅▄▄▅▆▆▇██▇█████████████▇▅▅▅▅▅▆▆▅▆▄▆ █
  19.6 ns      Histogram: log(frequency) by time      22.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.7))[])
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min … max):  19.394 ns … 35.163 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.588 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.620 ns ±  0.326 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

       ▃▅▇██▇▄                                                 
  ▂▂▃▅▇████████▅▃▂▂▂▁▁▁▁▂▂▁▁▂▁▂▁▁▂▂▁▁▂▁▁▂▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂ ▃
  19.4 ns         Histogram: frequency by time        20.7 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_1($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):  8.078 ns … 20.871 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     9.662 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.584 ns ±  0.374 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                       ▁  ▂▂▅▄▆  ▄▃ ▃▅▅▅█▇  ▂  ▂▁▃▃▅         ▂
  ▂▂▃▆▄▆▄▇▆▇█▇▅▅▅▆▇█▇▆█████████▇▅██▄██████▄▆█▅▇█████▅▄▇▅▃▆▄▄ █
  8.08 ns      Histogram: log(frequency) by time     10.5 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1))[])
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
 Range (min … max):  2.500 ns … 7.277 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.536 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.543 ns ± 0.107 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                   ▄   █  ▁▁▆   ▆    ▆   ▇   ▁               
  ▁▁▁▁▁▁▁▁▃▂▂▃▃▇▅▅▅█▆▇▇██████████▇▇▇██▇▇▆█▆▆▆█▄▄▃▃▄▂▂▂▁▁▁▁▁ ▄
  2.5 ns         Histogram: frequency by time       2.57 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_1($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(2.0))[])
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):   9.267 ns … 17.994 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     10.538 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.502 ns ±  0.383 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

              ▃▄▃ ▁▂▅▃▄  ▃  ▅▄▆▆█▂    ▂▂▂▂▅▂                  ▂
  ▃▁▆▇▄▁▃▁▆▇▅▆█████████▅▆█▇███████▆█▇███████▅▇▆▆▆█▇▇█▇▅▆▇▆▅▆▆ █
  9.27 ns      Histogram: log(frequency) by time      11.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(2))[])
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
 Range (min … max):  5.276 ns … 11.283 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.684 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.751 ns ±  0.224 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                         ▇█▅                       ▄▅▂       ▂
  ▄▁▁▁▁▁▁▁▁▁▆█▄▁▁▁▁▁▁▁▁▁▁████▄▁▁▁▁▁▁▃▁▇▇▅▄▁▁▁▃▃▁▁▅▆███▇▄▃▃▄▃ █
  5.28 ns      Histogram: log(frequency) by time     6.24 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please change the code

@inline stolarsky_mean(x::Real, y::Real, gamma::Real) = stolarsky_mean(promote(x, y,
                                                                               gamma)...)

a few lines above to

@inline stolarsky_mean(x::Real, y::Real, gamma::Real) = stolarsky_mean(promote(x, y)..., gamma)

and then check the gamma isa Integer option?

Copy link
Contributor Author

@MarcoArtiano MarcoArtiano Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the following error when trying that. As far as I understood, that creates a loop because the main function is always defined for type Reals, and all the variables are bounded to be of the same type, hence the function calls itself and doesn't switch to the "main" one.

julia> function stolarsky_mean(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               if gamma isa Integer
                   yg = y^(gamma - 1)
                   xg = x^(gamma - 1)
               else
                   yg = exp((gamma - 1) * log(y)) # equivalent to y^gamma but faster for non-integers
                   xg = exp((gamma - 1) * log(x)) # equivalent to x^gamma but faster for non-integers
               end
               return (gamma - 1) / gamma * (yg * y - xg * x) /
                      (yg - xg)
           end
       end
stolarsky_mean (generic function with 2 methods)

julia> stolarsky_mean(x::Real, y::Real, gamma::Real) = stolarsky_mean(promote(x, y)..., gamma)
stolarsky_mean (generic function with 2 methods)

julia> @benchmark value = stolarsky_mean($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(2))[])
ERROR: StackOverflowError:
Stacktrace:
 [1] stolarsky_mean(x::Float64, y::Float64, gamma::Int64) (repeats 79984 times)
   @ Main .\REPL[31]:1

julia> stolarsky_mean(301.2, 421.2, 2)
ERROR: StackOverflowError:
Stacktrace:
 [1] stolarsky_mean(x::Float64, y::Float64, gamma::Int64) (repeats 79984 times)
   @ Main .\REPL[31]:1

A workaround would be something like:

stolarsky_mean(x::Real, y::Real, gamma::Real) = stolarsky_mean(promote(x, y)..., gamma)

function stolarsky_mean(x::RealT, y::RealT, gamma::Real) where {RealT <: Real}
    epsilon_f2 = convert(RealT, 1.0e-4)
    f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
    if f2 < epsilon_f2
        # convenience coefficients
        c1 = convert(RealT, 1 / 3) * (gamma - 2)
        c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
        c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
        return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
    else
        if gamma isa Integer
            yg = y^(gamma - 1)
            xg = x^(gamma - 1)
        else
            yg = exp((gamma - 1) * log(y)) # equivalent to y^gamma but faster for non-integers
            xg = exp((gamma - 1) * log(x)) # equivalent to x^gamma but faster for non-integers
        end
        return (gamma - 1) / gamma * (yg * y - xg * x) /
               (yg - xg)
    end
end

but again, shouldn't I allow the equation struct to accept different types and not promoting the gamma?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant was basically the second option:

julia> stolarsky_mean(x::Real, y::Real, gamma::Real) = stolarsky_mean(promote(x, y)..., gamma)
stolarsky_mean (generic function with 1 method)

julia> function stolarsky_mean(x::RealT, y::RealT, gamma::Real) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               if gamma isa Integer
                   yg = y^(gamma - 1)
                   xg = x^(gamma - 1)
               else
                   yg = exp((gamma - 1) * log(y)) # equivalent to y^gamma but faster for non-integers
                   xg = exp((gamma - 1) * log(x)) # equivalent to x^gamma but faster for non-integers
               end
               return (gamma - 1) * (yg * y - xg * x) / (gamma * (yg - xg))
           end
       end
stolarsky_mean (generic function with 2 methods)

julia> stolarsky_mean(300.1, 410.7, 2)
355.40000000000003

julia> stolarsky_mean(300.1, 410.7, 2.0)
355.39999999999986

julia> stolarsky_mean(300.1, 410.7f0, 2.0)
355.4000061035154

julia> stolarsky_mean(300.1, 410.7f0, 2)
355.40000610351564

Using this approach, gamma can be an Integer and will not be promoted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thank you! Sorry for the missunderstanding, I will introduce that.

src/auxiliary/math.jl Outdated Show resolved Hide resolved
Comment on lines +422 to +423
return (gamma - 1) / gamma * (yg * y - xg * x) /
(yg - xg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return (gamma - 1) / gamma * (yg * y - xg * x) /
(yg - xg)
return (gamma - 1) / gamma * (yg * y - xg * x) / (yg - xg)

Does this formatting work now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And does it make a difference if you avoid an additional division by rewriting this part as something like

Suggested change
return (gamma - 1) / gamma * (yg * y - xg * x) /
(yg - xg)
return (gamma - 1) * (yg * y - xg * x) / (gamma * (yg - xg))

?

Copy link
Contributor Author

@MarcoArtiano MarcoArtiano Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there's not a big difference. It is slightly faster for reals, but slightly slower for integers.


julia> @inline function stolarsky_mean_1(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               if isinteger(gamma)
               yg = y^(gamma-1)
               xg = x^(gamma-1)
               else
               yg = exp((gamma-1)*log(y))
               xg = exp((gamma-1)*log(x))
               end
               return (gamma - 1)/gamma * (yg*y - xg*x) /  (yg - xg) 
           end
       end
stolarsky_mean_1 (generic function with 1 method)

julia> @inline function stolarsky_mean_2(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               if isinteger(gamma)
                   yg = y^(gamma-1)
                   xg = x^(gamma-1)
                   else
               yg = exp((gamma-1)*log(y))
               xg = exp((gamma-1)*log(x))
                   end
               return (gamma - 1) * (yg*y - xg*x) / (gamma *  (yg - xg) ) 
           end
       end
stolarsky_mean_2 (generic function with 1 method)

julia> @benchmark value = stolarsky_mean_1($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(2.0))[])
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):  8.965 ns … 101.965 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     9.815 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.794 ns ±   1.531 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▁▂   ▃█▄        ▆▇    ▃    ▆▁    ▂▁                     ▁
  ▃▂▃▃██▆▅▅███▇▇▇▆▄▄▅▅██▇▄▆▇██▆▅▇██▄▄▅▆██▅▆██▆▅▄▅▆▇▇▅▅▇▄▄▃▅▄▆ █
  8.96 ns      Histogram: log(frequency) by time      11.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(2.0))[])
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):   9.183 ns … 19.411 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     10.492 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.481 ns ±  0.363 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                ▁▂▁▁▁▅▄▄  ▄▂ ▁▁█▆▇▁      ▂▁▁   ▁▂ ▁           ▂
  ▄▁▁▁▃▅▄▄▃██▆▅▆█████████▇██▇██████▇██▇▇▆███▆▇█████▇▆▆▆▇▇▇▇▇▇ █
  9.18 ns      Histogram: log(frequency) by time      11.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_1($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min … max):  19.662 ns … 45.032 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.761 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.815 ns ±  0.562 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▃█▂                                                        
  ▂▄███▄▂▂▂▁▂▂▂▁▁▁▂▁▁▂▂▂▂▁▂▁▁▁▂▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂ ▂
  19.7 ns         Histogram: frequency by time        21.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min … max):  19.621 ns … 27.873 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.987 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   20.087 ns ±  0.432 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▁▄▆▇█████▇▅▂                    ▁     ▁▁▂▁▁               ▃
  ▃▆████████████▇▆▃▆▆▅▅▄▅▆▆▄▆▆▅▆▇▇█▇████████████▇▆▆▅▃▆▅▄▆▄▅▄▆ █
  19.6 ns      Histogram: log(frequency) by time      22.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_1($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.7))[])
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min … max):  19.658 ns … 27.584 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.765 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.884 ns ±  0.437 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁▆█▇▄                                  ▁▁                   ▁
  ██████▃▅▅▃▄▅▅▄▅▄▄▅▃▅▄▄▅▅▄▅▅▅▆▅▇██▆▇▆▇▆▇███▇█▆▅▅▂▃▅▅▄▅▅▆▃▄▃▄ █
  19.7 ns      Histogram: log(frequency) by time      21.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.7))[])
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min … max):  19.642 ns … 42.111 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.983 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   20.010 ns ±  0.352 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

          ▁▂▅▅▇█▇▇▅▂                                           
  ▂▂▂▃▃▅▅▇███████████▅▃▂▂▂▂▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  19.6 ns         Histogram: frequency by time        21.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am mainly looking at the median results. There, the first option seems to be best all the time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, I will compare also full runs, to see if there are any noticeable differences.

Co-authored-by: Hendrik Ranocha <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance We are greedy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants