-
-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] reuse option in Newton Raphson method #206
Conversation
Codecov Report
@@ Coverage Diff @@
## master #206 +/- ##
==========================================
- Coverage 48.45% 48.10% -0.36%
==========================================
Files 19 19
Lines 1814 1846 +32
==========================================
+ Hits 879 888 +9
- Misses 935 958 +23
📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today! |
I had a discussion with @FHoltorf regarding an effective heuristic for updating the Jacobian:
|
That is precisely the thinking of Hairer II's approach. However, with a lot of benchmarking we have found that the optimal HP1 tends to be ... zero a lot of the time. So basically, just take new Jacobians when you start diverging instead of converging. That is a pretty hard heuristic to beat. In theory you could take it earlier if convergence slows, but we haven't found great schema for that. So for the first version I'd keep it simple just based off of a convergence heuristic set to zero. |
You may want to base it off of the reuse heuristics of OrdinaryDiffEq.jl. Take a look at its nonlinear solvers which documents this. |
Yeah seems to be the case for the 23 test problems as well (red line below = speed without reuse). I am wondering though why they look so similar. using NonlinearSolve, LinearAlgebra, LinearSolve, NonlinearProblemLibrary, Test, BenchmarkTools, Plots
problems = NonlinearProblemLibrary.problems
dicts = NonlinearProblemLibrary.dicts
function test_on_library(problems, dicts, ϵ=1e-5)
samples = 100
secs = 5
HP = [1e-14, 1e-13, 1e-12, 1e-11, 1e-10, 1e-9, 1e-8, 1e-7, 1e-6, 1e-5]
# store a list of all Plots
plts = []
alg = NewtonRaphson(; reuse=false)
broken_tests = (1,6) # test problems where method does not converge to small residual
for (idx, (problem, dict)) in enumerate(zip(problems, dicts))
title = dict["title"]
@testset "$title" begin
# Newton without reuse
x = dict["start"]
res = similar(x)
nlprob = NonlinearProblem(problem, x)
sol = solve(nlprob, alg, abstol = 1e-18, reltol = 1e-18)
problem(res, sol.u, nothing)
broken = idx in broken_tests ? true : false
@test norm(res)≤ϵ broken=broken
balg = @benchmarkable solve(nlprob, $alg, abstol=1e-18, reltol=1e-18)
talg = run(balg, samples=samples, seconds=secs)
# Newton with reuse
ts = []
for reusetol in HP
@show reusetol
sol = solve(nlprob, NewtonRaphson(; reuse=true, reusetol=reusetol), abstol=1e-18, reltol=1e-18)
problem(res, sol.u, nothing)
broken = idx in broken_tests ? true : false
@test norm(res) ≤ ϵ broken = broken
balg2 = @benchmarkable solve(nlprob, NewtonRaphson(; reuse=true, reusetol=$reusetol), abstol=1e-18, reltol=1e-18)
talg2 = run(balg2, samples=samples, seconds=secs)
push!(ts, mean(talg2.times) / mean(talg.times))
end
pl = scatter(HP, ts, xaxis=:log, xticks=HP, label=false, xlabel="HP", ylabel="mean(time w/ reuse)/mean(time wo/ reuse)", title=title)
hline!([1.0], color = "red", label = false)
display(pl)
savefig(pl, "Reuse-Newton-" * title * ".png")
push!(plts, pl)
end
end
return plts
end
# NewtonRaphson
plts = test_on_library(problems, dicts)
# Merge the plots into a single plot
merged_plot = plot(plts..., layout=(5, 5), size=(2500, 2500), margin=10*Plots.mm)
savefig(merged_plot, "Reuse-Newton.png")
So only recomputing when the residual increases?
I only scanned quickly through |
Yup.
https://github.com/SciML/OrdinaryDiffEq.jl/blob/master/src/nlsolve/nlsolve.jl#L63-L96 |
One heuristic (or part of one) that might be interesting to also try and incorporate is to recompute Jacobians when the Newton direction associated with the old Jacobian stops being a descent direction for the residual norm. If for nothing else but the fact that it would leverage the AD system nicely. One would need to compute the gradient of the residual norm with reverse mode AD but, even though more expensive then just checking the residual, it would make things play nicely with linesearch. (If you don't make any of such checks, I am guessing you could run into trouble if you want to combine Jacobian reuse with linesearch?) |
Ahh yes that would be an interesting approach to try. |
For the gradient of the residual norm, I think we would need to change the default norm. https://github.com/SciML/DiffEqBase.jl/blob/master/src/common_defaults.jl#L26C1-L33C4 drops the gradient function res_norm1(u)
b = f(u, p)
sqrt(sum(abs2, b) / length(u))
end
function res_norm2(u)
b = f(u, p)
DiffEqBase.ODE_DEFAULT_NORM(b, nothing)
end
ForwardDiff.gradient(res_norm2, u0) # => zero(u0) (Why is it actually |
That's just matching Hairer, I don't think it really matters which one it is.
We just shouldn't be differentiating the solvers here at all |
how did the AD thing come up? |
Only for testing this idea, which might be necessary for Jacobian reuse if |
ahh yeah, the ODE one purposely changes away from differentiating the norm, which isn't what you want here. |
@ChrisRackauckas: @yonatanwesen, @FHoltorf, and I were talking again about a good example today. Was there any specific ODE system for which the reuse strategy in https://github.com/SciML/OrdinaryDiffEq.jl/blob/master/src/nlsolve/nlsolve.jl#L63-L96 was particularly good? (To have another starting point.) |
Anything sufficiently large. Bruss with N=32 should do. |
SciML/SciMLBenchmarks.jl#796 has a benchmarking script. Drop the sundials and minpack versions and you should be able to test quite rapidly |
https://sundials.readthedocs.io/en/latest/kinsol/Mathematics_link.html#jacobian-information-update-strategy sundials also reuses jacobian |
though their scheme might not be ideal, given how long it takes for bruss |
what's the cost breakdown like? |
Do you mean the cost when I profile it? |
Yes share some flamegraphs https://github.com/tkluck/StatProfilerHTML.jl |
|
Check the stats and trace to see how many times the factorization is reused |
#345 supercedes this |
TODOs: