You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to discuss the best way to have an API for RegressionModels that are rank deficient. In particular, I would like to have a mechanism allowing packages interfacing through StatsAPI to recover the collinear columns of the model matrix.
Using the StatsAPI, there is no way to recover the three columns of the model matrix that are linearly independent. However, one can do this by looking at the DensePredChol (and eventually at the DensePredQR).
For instance, if I want to calculate X'X using only these columns, I could do
X = modelmatrix(lm_1)
XX = X'X
ch = lm_1.model.pp.chol
rnk = rank(ch)
XX[1:end .∉ [ch.p[rnk+1:end]], 1:end .∉ [ch[rnk+1:end]]]
and similarly, for the columns of the modelmatrix:
Z = X[:, 1:end .∉ [ch.p[rnk+1:end]]]
I don't know the best way to expose this at the API level. Maybe something like this?
function StatsAPI.collinearindexes(r::RegressionModel) end
The implementation in GLM could then look something like this:
function GLM.collinearindexes(m::LinPredModel) ## of course only for `CholeskyPivoted` (and `QRPivoted` when implemented)
ch = m.pp.chol
rnk = rank(ch)
return ch.p[1:rnk], ch.p[rnk+1:end]
end
Probably not great. Ideas?
P.S.: This is very useful to make packages like CovarianceMatrices play nice with rank-deficient RegressionModels.
The text was updated successfully, but these errors were encountered:
I want to discuss the best way to have an API for
RegressionModels
that are rank deficient. In particular, I would like to have a mechanism allowing packages interfacing throughStatsAPI
to recover the collinear columns of the model matrix.Consider this example (I am going to use
GLM.jl
):Using the
StatsAPI
, there is no way to recover the three columns of the model matrix that are linearly independent. However, one can do this by looking at theDensePredChol
(and eventually at the DensePredQR).For instance, if I want to calculate
X'X
using only these columns, I could doand similarly, for the columns of the
modelmatrix
:I don't know the best way to expose this at the API level. Maybe something like this?
The implementation in
GLM
could then look something like this:Probably not great. Ideas?
P.S.: This is very useful to make packages like CovarianceMatrices play nice with rank-deficient
RegressionModels
.The text was updated successfully, but these errors were encountered: