Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise 1d Gauss-Legendre interpolation #216

Merged
merged 3 commits into from
Jun 9, 2024

Conversation

johnomotani
Copy link
Collaborator

Precalculate some quantities to minimise the amount of work needed to evaluate the Lagrange polynomials, and hopefully to make the loops vectorise better. Speeds up interpolation by about 3x in one case that I profiled.

@johnomotani johnomotani force-pushed the optimise-Lagrange-interpolation branch from 1bd7e9a to d989adc Compare May 21, 2024 07:31
Precalculate some quantities to minimise the amount of work needed to
evaluate the Lagrange polynomials, and hopefully to make the loops
vectorise better.
@johnomotani johnomotani force-pushed the optimise-Lagrange-interpolation branch from d989adc to d1459e5 Compare May 21, 2024 08:01
@mrhardman
Copy link
Collaborator

mrhardman commented May 21, 2024

Wow, a 3x speed up would be impressive!

What I don't understand, and what I am wondering, is how did you know that the "for loop" version

function lagrange_poly(j,x_nodes,x)
    # get number of nodes
    n = size(x_nodes,1)
    # location where l(x0) = 1
    x0 = x_nodes[j]
    # evaluate polynomial
    poly = 1.0
    for i in 1:j-1
            poly *= (x - x_nodes[i])/(x0 - x_nodes[i])
    end
    for i in j+1:n
            poly *= (x - x_nodes[i])/(x0 - x_nodes[i])
    end
    return poly
end

would be slower than the version using Julia base functions

function lagrange_poly_optimised(other_nodes, one_over_denominator, x)
    return prod(x - n for n ∈ other_nodes) * one_over_denominator
end

In my previous life as a Fortran developer, I would have expected that there would be negligible difference between to two methods for a compiled language -- what am I missing?

Some timing checks would be helpful here to justify the changes (from the comments it looks like you are planning to do this anyway). If this method was generalised to an N-dimensional case I would be concerned that the memory requirement for storing the "other nodes" for every node might become an issue.

@johnomotani
Copy link
Collaborator Author

johnomotani commented May 27, 2024

@mrhardman I don't think it was other changes that made the speedup. I didn't test separately, but the significant things are:

  • precompute the denominator, which gets rid of about half the arithmetic operations, and division is more expensive than multiplication/addition/subtraction.
  • combine the two loops (for less-than-j and greater-than-j) into a single loop, which should vectorise better.

I did also try a for-loop version of lagrange_poly_optimised(), which was essentially identical timing to the one using prod. prod was marginally faster (although the difference could have just been noise), and is more compact to write, so I chose to use that version.

I still think we'll want to do N 1d interpolations rather than one N-dimensional interpolation (although that has its own memory cost for at least one buffer array...), so I'm only anticipating 1d versions of "other nodes". It's true if we wanted an N-dimensional version, the memory usage for "other nodes" could become a concern!

@johnomotani
Copy link
Collaborator Author

Some timing checks would be helpful here to justify the changes

I did the one test on a case I happened to be running anyway. I was using the profiler (StatProfilerHTML) to measure how much time was spent in different parts of the code, so it's not too convenient to post the results. The speedup (factor of a few) was in line with what I'd expect, so I wasn't planning to do any more timing for this PR.

I did suggest a replacement for interpolate_2D_vspace!() that I think should be faster, but left it commented out because it would need testing for both correctness and performance, and it needs an additional pdf-sized buffer passing as an argument - I wasn't planning to include that change in this PR unless someone else has time to do the testing.

@mrhardman
Copy link
Collaborator

I did suggest a replacement for interpolate_2D_vspace!() that I think should be faster, but left it commented out because it would need testing for both correctness and performance, and it needs an additional pdf-sized buffer passing as an argument - I wasn't planning to include that change in this PR unless someone else has time to do the testing.

There is already an existing CI test for the `interpolate_2D_vspace!() function in the Fokker-Planck tests that you should be able to quickly modify if you want to include this here. You would be able to add a logical flag to the list of tests made a this line

@testset " - test Lagrange-polynomial 2D interpolation" begin
, and then switch between my original interpolation and your optimised version at these lines
interpolate_2D_vspace!(Fe_interp_ion_units,Fe,vpa,vperp,scalefac)
and
interpolate_2D_vspace!(Fi_interp_electron_units,Fi,vpa,vperp,1.0/scalefac)
.

Hope this helps.

Base automatically changed from gausslegendre-1d-interpolation to master June 4, 2024 15:25
@johnomotani johnomotani merged commit 6c09bb0 into master Jun 9, 2024
13 of 16 checks passed
@johnomotani johnomotani deleted the optimise-Lagrange-interpolation branch June 9, 2024 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants