-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numerical instability with general X, D #8
Comments
Thank you @huisaddison for pointing this out ... scary and subtle! My first thought is that initial linear system defined by D D^T u = D y, where D is the modified analysis operator (defined by the pseudoinverse of your X = P and the original D) must be terribly conditioned. Otherwise I don’t understand how the paths could be so divergent. I think basically different ways of solving this system interact (SVD versus QR) with different versions of the forming “original D” to produce very different solutions. So what I would do to debug this genlasso issue would just be to get all those modified D’s in memory, compute their condition numbers, and and then try computing all the solutions (SVD versus QR). And then look at them (say, plot them) to see how divergent they are. (Not suggesting that you should do this ... if @statsmaths or I have free cycles we could do this too ... just recording my own thoughts so I don't forget them ...) |
A minimal set of code for constructing the modified D's: https://gist.github.com/huisaddison/fe089e9dce7d61dcaeca84e6553664fd The condition numbers are pretty large, but the matrices look relatively close to each other, especially compared to their absolute magnitudes:
I also looked at the modified y's -- negligible differences there.
Next steps would be to look at the solution paths themselves using |
There are numerical instability issues in the general X, D setting which can lead genlasso to terminate unexpectedly early.
See the MRE here: https://gist.github.com/huisaddison/31ea593ae945bb2494033abcf0ddbf1a
Running this MRE yields the output, when
svd=TRUE
:but when
svd=FALSE
(the default):Sorry for not providing much more information aside from this observation, I haven't dug deep into this myself to fully understand why this is occurring. Some notes:
genlasso/R/genlasso.R
Lines 69 to 72 in 5adbeb9
svd=TRUE
, but runs for longer whensvd=FALSE
. (And the only difference between these problems is numerical error between the D matrices). See the R Markdown notebook here: https://gist.github.com/huisaddison/da568096fbf01f780e7f06aa9616b8de So thesvd=TRUE
routine isn't necessarily a "gold standard" for bypassing the numerical issues -- there's something that's going on that I don't fully understand.The text was updated successfully, but these errors were encountered: