TrustRegion bugs #210

FHoltorf · 2023-09-14T16:05:36Z

I had a look at #142. Indeed there were some mistakes in how the Gauss-Newton step and Cauchy point were calculated. This is still WIP, however, since TrustRegion is still not up to par with nlsolve's implementation after these changes.

I suspect a large issue for the poor performance is the default TR adaptation and step acceptance/rejection routine; for instance the current default is rather conservative with respect to which steps are being accepted. Lowering the step_threshold default to 1e-4 from 0.1 (and after correcting the GN/Cauchy step computations) the performance on the test set is already quite a bit better.

Before

nr  Problem                                            n     NewtonRaphson       TrustRegion         LevenbergMarquardt  nlsolve
1   Generalized Rosenbrock function                    10    3.9833e-311         4.9193              4.9193              0.0
2   Powell singular function                           4     7.0435e-16          7.941               0.021633            7.0217e-16
3   Powell badly scaled function                       2     0.0                 26897.0             2.1983e-14          0.0
4   Wood function                                      4     4.1832e-14          19081.0             8550.5              8.2157e-15
5   Helical valley function                            3     6.2775e-30          50.0                50.0                4.3667e-26
6   Watson function                                    2     18860.0             3004.4              129.15              114.89
7   Chebyquad function                                 2     5.5511e-17          0.075044            0.34127             9.4369e-16
8   Brown almost linear function                       10    0.0                 0.01533             16.53               4.4409e-16
9   Discrete boundary value function                   10    5.4548e-16          0.0047026           6.5869e-15          2.7756e-16
10  Discrete integral equation function                10    2.7807e-15          1.9166e-15          9.3334e-15          5.5511e-17
11  Trigonometric function                             10    1.4753e-15          2.2808e-15          0.084117            6.8695e-16
12  Variably dimensioned function                      10    0.0                 2.8636e105          1.376e-12           0.0
13  Broyden tridiagonal function                       10    1.2803e-15          2.7487e-15          1.1633e-13          6.6613e-16
14  Broyden banded function                            10    1.4002e-15          2.1729e-15          1.6865e-13          5.5511e-16
15  Hammarling 2 by 2 matrix square root problem       4     2.2204e-16          0.32355             0.91487             2.2204e-16
16  Hammarling 3 by 3 matrix square root problem       9     2.2204e-16          0.30203             1.0506              2.2204e-16
17  Dennis and Schnabel 2 by 2 example                 2     0.0                 4.4408e-16          17.262              0.0
18  Sample problem 18                                  2     0.0                 6.2862e-9           2.1951              8.291e-17
19  Sample problem 19                                  2     9.5092e-16          3.4505e-7           76.367              7.7402e-16
20  Scalar problem f(x) = x(x - 5)^2                   1     0.0                 1.8919e-16          2.5564e-18          0.0
21  Freudenstein-Roth function                         2     0.0                 20.012              20.012              4.949
22  Boggs function                                     2     0.0                 1.6031              2.0                 3.9431e-17
23  Chandrasekhar function                             10    6.6613e-16          9.9301e-16          1.0906e-13          2.2204e-16

After

nr  Problem                                            n     NewtonRaphson       TrustRegion         LevenbergMarquardt  nlsolve             
1   Generalized Rosenbrock function                    10    4.0179e-314         0.0050578           4.9193              0.0                 
2   Powell singular function                           4     7.0435e-16          2.1234e-6           0.021633            7.0217e-16          
3   Powell badly scaled function                       2     0.0                 0.0                 9.9944e-15          0.0                 
4   Wood function                                      4     4.1138e-14          4287.4              8550.5              8.2157e-15          
5   Helical valley function                            3     6.2775000000000006e-301.5158e-18          50.0                4.3667e-26          
6   Watson function                                    2     18979.0             28248.0             129.15              114.89              
7   Chebyquad function                                 2     5.5511e-17          1.6653e-16          0.34127             9.4369e-16          
8   Brown almost linear function                       10    0.0                 1.1102e-16          16.53               4.4409e-16          
9   Discrete boundary value function                   10    5.4548e-16          6.9388e-17          7.2775e-15          2.7756e-16          
10  Discrete integral equation function                10    2.7807e-15          3.4694e-17          4.8191e-15          5.5511e-17          
11  Trigonometric function                             10    1.4753e-15          6.2384e-16          0.084117            8.3961e-16          
12  Variably dimensioned function                      10    0.0                 0.0                 1.3104e-12          0.0                 
13  Broyden tridiagonal function                       10    1.2803e-15          1.1376e-15          1.1957e-13          6.6613e-16          
14  Broyden banded function                            10    1.4002e-15          1.3824e-15          1.665e-13           5.5511e-16          
15  Hammarling 2 by 2 matrix square root problem       4     2.2204e-16          0.017726            0.91487             2.2204e-16          
16  Hammarling 3 by 3 matrix square root problem       9     2.2204e-16          0.0093482           1.0506              2.2204e-16          
17  Dennis and Schnabel 2 by 2 example                 2     0.0                 0.0                 17.262              0.0                 
18  Sample problem 18                                  2     0.0                 1.413e-15           2.1951              7.658e-18           
19  Sample problem 19                                  2     9.5092e-16          0.00013496          76.367              7.7402e-16          
20  Scalar problem f(x) = x(x - 5)^2                   1     0.0                 1.9702e-24          2.5564e-18          0.0                 
21  Freudenstein-Roth function                         2     0.0                 20.012              20.012              4.949               
22  Boggs function                                     2     0.0                 7.3505e-17          2.0                 3.9431e-17          
23  Chandrasekhar function                             10    6.6613e-16          7.6918e-16          1.2718e-13          2.2204e-16

Will investigate further.

codecov · 2023-09-14T16:15:51Z

Codecov Report

Merging #210 (f5c66c4) into master (d1e0762) will decrease coverage by 0.96%.
The diff coverage is 92.70%.

@@            Coverage Diff             @@
##           master     #210      +/-   ##
==========================================
- Coverage   94.94%   93.98%   -0.96%     
==========================================
  Files           8        8              
  Lines         732      782      +50     
==========================================
+ Hits          695      735      +40     
- Misses         37       47      +10

Files	Coverage Δ
src/trustRegion.jl	`95.91% <92.70%> (-2.97%)`	⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

yash2798 · 2023-09-18T00:31:41Z

src/trustRegion.jl

        cache.stats.njacs += 1
    end

-    linres = dolinsolve(alg.precs, linsolve, A = cache.H, b = _vec(cache.g),
+    linres = dolinsolve(alg.precs, linsolve, A = J, b = _vec(fu), # cache.H, b = _vec(cache.g),


I think this is incorrect because here $J$ is the jacobian of the actual function and not the merit function $0.5 * ||f(x)||_{2}^2$, but the initial estimate of the step size u_tmp is actually as calculated $-H^{-1}g$, where H and g are the hessian and gradient of the merit func at u ?

That is a subtle point. You are right that on paper you do a Gauss-Newton step, i.e., $p_{GN} = -(J^\top J)^{-1}J^\top f$ (since we use $H = J^\top J$ as proxy for the Hessian of the merit function and $g= J^\top f$). But clearly, if $J$ is non-singular this is equivalent to just the Newton step, i.e., $p_{GN} = p_N = -J^{-1} f$. At the same time, computing $J^\top J$ squares the condition number of $J$ which makes using the Gauss-Newton step unfavorable. In NLsolve, the same is done actually. The only difference is that they default to computing the Moore-Penrose inverse of $J$ via the SVD when it is singular. We don't have that exception yet (unless it is burried in dolinsolve?). I meant to ask how we should go about this actually. I feel like this should be handled by LinearSolve via some option which is why I didn't put it in here yet.

@ChrisRackauckas Any thoughts on how to go about the case of singular Jacobians? Is this already handled in LinearSolve, for example triggered by an LAPACK singularity error or so? If not, I feel it really should go there as opposed to here. That said, putting it here would be a quick solution for now.

LinearSolve generally attempts to be non-blocking, so it will return a solution without erroring, though the return code may indicate that the solution did not solve correctly. It should just be a retcode check then, though specifically with singularity issues we probably need to add a few retcode improvements. In the ODE solver that is generally not checked because the adaptivity handles it appropriately, though this is a general improvement we should do anyways.

…to avoid growing ill-conditioning

…chemes

…t to Float64.

…nt with the rest of the package)

FHoltorf · 2023-09-26T20:04:08Z

rebased and now formally correct.

Last things to do:

Fix tests
Tune hyperparameters

… for now

…oids a bug due to mutation of Jacobian in dolinsolve.

FHoltorf · 2023-09-27T03:47:49Z

I think I have caught all bugs now.

When using the NLsolve trust region scheme, the performance is comparable to NLsolve.jl (see below). There are slight differences which are due to different convergence criteria, I believe. The other trust region schemes perform similarly now (usually a little worse on this test set but not to a meaningful enough point where I would suspect more bugs).

nr  Problem                                            n     NewtonRaphson       TrustRegion         LevenbergMarquardt  nlsolve             
1   Generalized Rosenbrock function                    10    NaN                 0.0                 4.9193              0.0                 
2   Powell singular function                           4     7.0435e-16          7.0435e-16          0.021633            7.0217e-16          
3   Powell badly scaled function                       2     0.0                 0.0                 9.9944e-15          0.0                 
4   Wood function                                      4     4.1138e-14          1.2921e-14          8550.5              1.1546e-14          
5   Helical valley function                            3     6.2775000000000006e-301.0408e-16          50.0                1.0408e-16          
6   Watson function                                    2     NaN                 74.676              NaN                 114.89              
7   Chebyquad function                                 2     5.5511e-17          5.5511e-17          0.34127             5.5511e-17          
8   Brown almost linear function                       10    0.0                 8.8817e-16          16.53               8.8818e-16          
9   Discrete boundary value function                   10    5.4548e-16          5.4548e-16          7.2775e-15          2.7756e-16          
10  Discrete integral equation function                10    2.7807e-15          2.7807e-15          4.8191e-15          5.5511e-17          
11  Trigonometric function                             10    1.4753e-15          2.6733e-15          0.084117            8.4655e-16          
12  Variably dimensioned function                      10    0.0                 0.0                 1.3104e-12          0.0                 
13  Broyden tridiagonal function                       10    1.2803e-15          1.2803e-15          1.1957e-13          8.8818e-16          
14  Broyden banded function                            10    1.4002e-15          1.4002e-15          1.665e-13           8.8818e-16          
15  Hammarling 2 by 2 matrix square root problem       4     2.2204e-16          6.8564e-8           0.91487             6.853e-8            
16  Hammarling 3 by 3 matrix square root problem       9     2.2204e-16          3.7631e-29          1.0506              3.7632000000000004e-29
17  Dennis and Schnabel 2 by 2 example                 2     0.0                 0.0                 17.262              0.0                 
18  Sample problem 18                                  2     0.0                 0.0                 2.1951              0.0                 
19  Sample problem 19                                  2     9.5092e-16          9.5092e-16          76.367              6.7241e-16          
20  Scalar problem f(x) = x(x - 5)^2                   1     0.0                 0.0                 2.5564e-18          0.0                 
21  Freudenstein-Roth function                         2     0.0                 6.9988              20.012              4.949               
22  Boggs function                                     2     0.0                 1.077e-16           2.0                 1.077e-16           
23  Chandrasekhar function                             10    6.6613e-16          6.6613e-16          1.2718e-13          4.4409e-16

Just for reference, here is the before

nr  Problem                                            n     NewtonRaphson       TrustRegion         LevenbergMarquardt  nlsolve             
1   Generalized Rosenbrock function                    10    NaN                 4.9193              4.9193              0.0                 
2   Powell singular function                           4     7.0435e-16          7.941               0.021633            7.0217e-16          
3   Powell badly scaled function                       2     0.0                 4111.8              9.9944e-15          0.0                 
4   Wood function                                      4     4.1138e-14          2985.5              8550.5              1.1546e-14          
5   Helical valley function                            3     6.2775000000000006e-3050.0                50.0                1.0408e-16          
6   Watson function                                    2     NaN                 6491.3              129.15              114.89              
7   Chebyquad function                                 2     5.5511e-17          0.075044            0.34127             5.5511e-17          
8   Brown almost linear function                       10    0.0                 0.0                 16.53               8.8818e-16          
9   Discrete boundary value function                   10    5.4548e-16          0.0046978           7.2775e-15          2.7756e-16          
10  Discrete integral equation function                10    2.7807e-15          8.1217e-17          4.8191e-15          5.5511e-17          
11  Trigonometric function                             10    1.4753e-15          3.1524e-15          0.084117            8.4655e-16          
12  Variably dimensioned function                      10    0.0                 0.0                 1.3104e-12          0.0                 
13  Broyden tridiagonal function                       10    1.2803e-15          1.143e-15           1.1957e-13          8.8818e-16          
14  Broyden banded function                            10    1.4002e-15          1.4002e-15          1.665e-13           8.8818e-16          
15  Hammarling 2 by 2 matrix square root problem       4     2.2204e-16          0.32355             0.91487             6.853e-8            
16  Hammarling 3 by 3 matrix square root problem       9     2.2204e-16          0.30203             1.0506              3.7632000000000004e-29
17  Dennis and Schnabel 2 by 2 example                 2     0.0                 4.4408e-16          17.262              0.0                 
18  Sample problem 18                                  2     0.0                 5.5995e-18          2.1951              0.0                 
19  Sample problem 19                                  2     9.5092e-16          1.0542e-6           76.367              6.7241e-16          
20  Scalar problem f(x) = x(x - 5)^2                   1     0.0                 1.9702e-24          2.5564e-18          0.0                 
21  Freudenstein-Roth function                         2     0.0                 20.012              20.012              4.949               
22  Boggs function                                     2     0.0                 1.6031              2.0                 1.077e-16           
23  Chandrasekhar function                             10    6.6613e-16          4.965e-16           1.2718e-13          4.4409e-16

FHoltorf · 2023-09-27T04:05:07Z

src/trustRegion.jl

-        linu = _vec(u_tmp),
+        # do not use A = cache.H, b = _vec(cache.g) since it is equivalent 
+        # to  A = cache.J, b = _vec(fu) as long as the Jacobian is non-singular
+        linres = dolinsolve(alg.precs, linsolve, A = J, b = _vec(fu), 


this seems to be mutating J (resp. A) by default. I assume that is intended but requires care when the trust region step is being rejected. For now I simply avoid recomputing the Gauss-Newton step in that case since that is just unnecessary work (although we used to do it for some reason). That said, this won't work for Jacobian reuse trust region methods which may be on the horizon. Just leaving this note here since this is quite subtle and can lead to hard-to-detect problems.

@frankschae @avik-pal make note.

I recently saw this as well while debugging a Gauss Newton bug, do we know why this is happening? Given that alias_A = true

ChrisRackauckas · 2023-09-27T10:07:02Z

src/trustRegion.jl

+
+    The same updating scheme as in NLsolve's (https://github.com/JuliaNLSolvers/NLsolve.jl) trust region dogleg implementation. 
+    """
+    NLsolve


Why not just call it dogleg?

They are all dogleg methods, i.e., they all use the dogleg step to compute the update. The difference between all options is just how the trust region is being adapted.

I see, it's a bit of a weird name though. Is there no reference to where it came from? If not then we should credit Spenser Lyon who first implemented what was there.

I didn't see anything in the NLsolve docs or code. I just opened an issue. Maybe someone knows more over there. If not, we can also just call it Lyon.

Let's update the name in a follow up. I want to get this into the benchmarks though.

ChrisRackauckas · 2023-09-27T10:07:21Z

src/trustRegion.jl

+    Trust region updating scheme as in Nocedal and Wrigt [see Alg 11.5, page 291]. 
+    """
+    NW


Suggested change

Trust region updating scheme as in Nocedal and Wrigt [see Alg 11.5, page 291].

"""

NW

Trust region updating scheme as in Nocedal and Wrigt [see Alg 11.5, page 291].

"""

NocedalWrigt

ChrisRackauckas · 2023-09-27T10:12:19Z

Is the full 23 function thing a test already? If not, can we make it a regression test please?

FHoltorf · 2023-09-27T13:42:25Z

I don't think they are. It would hit a lot (if not all) of the code though which would be nice. I am wondering though: How do we distinguish failure from success since we can't expect all methods to converge to a small residual on all problems?

avik-pal · 2023-09-27T14:29:22Z

What about we use the list you have above and mark our solvers that currently fail as test_broken (I know it is not exactly correct since some of them are expected to not converge and aren't exactly broken)? That way, we won't have a regression in the future of the currently working versions

FHoltorf marked this pull request as draft September 14, 2023 17:21

yash2798 reviewed Sep 18, 2023

View reviewed changes

FHoltorf and others added 9 commits September 26, 2023 00:44

fix incorrect gradient and Gauss-Newton Hessian proxy

081f3b1

fix Cauchy point calculation in dogleg step

f20e897

expose quadratic form structure explicitly

6f3556e

add NLsolve trust region updating scheme and change GN step to -J\fu …

146dec9

…to avoid growing ill-conditioning

add Nocedal and Wright trust region updating scheme

884aafc

add meaningful description for NLsolve and NW trust region updating s…

094eb34

…chemes

cache memory for cauchy step to enable non-allocating code

0e99655

parameter types should not be converted to eltype(u). For now, defaul…

439415b

…t to Float64.

finish rebase to master

0ba652b

FHoltorf force-pushed the tr branch from f24b0ca to 0ba652b Compare September 26, 2023 17:57

introduce better variable names (and also ones that are more consiste…

6f7ef29

…nt with the rest of the package)

FHoltorf added 9 commits September 26, 2023 16:50

fix consistency test

79ab992

fix oop perform step and Fan scheme initialization

b749501

improve comment

21d246d

use less conservative step acceptance policy

177def2

choose Float64 as default type for trust region adaptation parameters…

0190764

… for now

hardcode NLsolve parameters

5f298e3

add NLsolve-like trust region initialization

b2b5d89

avoid recomputation of GN step if TR step was rejected. Faster and av…

f67ced5

…oids a bug due to mutation of Jacobian in dolinsolve.

run SciML formatter

34821e5

FHoltorf commented Sep 27, 2023

View reviewed changes

FHoltorf marked this pull request as ready for review September 27, 2023 04:28

ChrisRackauckas reviewed Sep 27, 2023

View reviewed changes

ChrisRackauckas requested a review from avik-pal September 27, 2023 10:13

FHoltorf added 3 commits September 27, 2023 11:23

rename NW -> NocedalWright

609f67c

convergence check for NocedalWright

32cf735

test new trust region schemes

e77fa76

FHoltorf mentioned this pull request Sep 27, 2023

Test problem library for regression tests #222

Closed

run formatter

f5c66c4

avik-pal approved these changes Sep 29, 2023

View reviewed changes

ChrisRackauckas approved these changes Sep 29, 2023

View reviewed changes

ChrisRackauckas merged commit 3a750aa into SciML:master Sep 29, 2023
7 of 9 checks passed

This was referenced Oct 17, 2023

TrustRegion correctness #142

Closed

More trust region schemes #151

Closed

Turn off trust region when the iteration is stuck in local minima #146

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TrustRegion bugs #210

TrustRegion bugs #210

FHoltorf commented Sep 14, 2023 •

edited

Loading

codecov bot commented Sep 14, 2023 •

edited

Loading

yash2798 Sep 18, 2023

FHoltorf Sep 19, 2023

FHoltorf Sep 19, 2023

ChrisRackauckas Sep 19, 2023

FHoltorf commented Sep 26, 2023 •

edited

Loading

FHoltorf commented Sep 27, 2023

FHoltorf Sep 27, 2023

ChrisRackauckas Sep 27, 2023

avik-pal Sep 27, 2023

ChrisRackauckas Sep 27, 2023

FHoltorf Sep 27, 2023

ChrisRackauckas Sep 28, 2023

FHoltorf Sep 28, 2023

ChrisRackauckas Sep 29, 2023

ChrisRackauckas Sep 27, 2023

ChrisRackauckas commented Sep 27, 2023

FHoltorf commented Sep 27, 2023

avik-pal commented Sep 27, 2023

TrustRegion bugs #210

TrustRegion bugs #210

Conversation

FHoltorf commented Sep 14, 2023 • edited Loading

codecov bot commented Sep 14, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FHoltorf commented Sep 26, 2023 • edited Loading

FHoltorf commented Sep 27, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChrisRackauckas commented Sep 27, 2023

FHoltorf commented Sep 27, 2023

avik-pal commented Sep 27, 2023

FHoltorf commented Sep 14, 2023 •

edited

Loading

codecov bot commented Sep 14, 2023 •

edited

Loading

FHoltorf commented Sep 26, 2023 •

edited

Loading