-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add keyword arg to modelmatrix; define momentmatrix #16
base: main
Are you sure you want to change the base?
Conversation
test/regressionmodel.jl
Outdated
@@ -6,13 +6,32 @@ using StatsAPI: RegressionModel, crossmodelmatrix | |||
struct MyRegressionModel <: RegressionModel | |||
end | |||
|
|||
struct ItsRegressionModel <: RegressionModel | |||
wts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wts | |
wts::AbstractVector |
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16 +/- ##
===========================================
- Coverage 100.00% 97.43% -2.57%
===========================================
Files 3 2 -1
Lines 37 39 +2
===========================================
+ Hits 37 38 +1
- Misses 0 1 +1 ☔ View full report in Codecov by Sentry. |
src/regressionmodel.jl
Outdated
|
||
Return the model matrix (a.k.a. the design matrix). | ||
Return the model matrix (a.k.a. the design matrix) or, if `weighted=true` the weighted | ||
model matrix, i.e. `X' * sqrt.(W)`, where `X` is the model matrix and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why transpose X
? It sounds weird to change the orientation of the result depending on whether it's weighted or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad...I will fix it
src/regressionmodel.jl
Outdated
|
||
Return the model matrix (a.k.a. the design matrix). | ||
Return the model matrix (a.k.a. the design matrix) or, if `weighted=true` the weighted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return the model matrix (a.k.a. the design matrix) or, if `weighted=true` the weighted | |
Return the model matrix (design matrix) or, if `weighted=true` the weighted |
src/regressionmodel.jl
Outdated
|
||
Return `X'X` where `X` is the model matrix of `model`. | ||
Return `X'X` where `X` is the model matrix of `model` or, if `weighted=true`, `X'WX`, | ||
where `W` is the diagonal matrix whose elements are the model weights. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we define weights?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add a link to the weights(::StatisticalModel)
method. Indeed there can be confusion between prior weights and working weights (though these terms can also confuse casual users).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How exactly do I add a link to weights(::StatisticalModel)
? Is there a way to link docs from different packages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in the same package so I think something like [model weights](@ref weights(::StatisticalModel))
should work. Better test it though by building the StatsBase docs (julia docs/make.jl
) using the updated StatsAPI.
Co-authored-by: Moritz Schauer <[email protected]>
The only thing that we probably should do is to allow for We could do it next (after I drop a bomb-PR against GLM. The GLM PR is waiting for this PR to get merged) |
Let's tackle this separately. :-) I'd rather review the GLM PR before merging this one, usually having the implementation is a good way to check that the API is the right one. |
I think it would be helpful to think about the API for dealing with rank-deficient models. For instance, |
residuals(model::RegressionModel; weighted::Bool=false) | ||
|
||
Return the residuals of the model or, if `weighted=true`, the residuals multiplied by | ||
the square root of the [model weights](@ref weights(::StatisticalModel)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the square root come from exactly? Doesn't that assume a particular definition of residuals (i.e. using L2-norm rather than e.g. L1-norm)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this is tricky. Also modelmatrix
multiplies the entries of X
by the square-root of the weights. Why?
Think about the linear model. With weights, the crossmodel matrix is modelmatrix(lm1; weighted = true)'modelmatrix(lm1; weighted = true)
.
Notice that this is consistent with R
; see, e.g., the function weighted.residuals which is in stats.
With weights, any weights is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, another tricky point. My understanding is that for residuals, the square root comes from the fact that deviance residuals themselves are defined as the square root of quantities which are partitions of the deviance. Right?
Note the R docstring for weighted.residuals
says "Weighted residuals are based on the deviance residuals", which are only one kind of residual. Actually in R residuals
also returns weighted residuals, except for response residuals, which are always unweighted. Maybe to be completely accurate we could say "for deviance and Pearson residuals...", so that packages are free to use different definitions (or throw an error) if needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what @nalimilan says is that the assumption here (and in your change of modelmatrix
) is that for all kinds of weights the weighted model matrix is X * sqrt.(W)
. Is it always true for FrequencyWeights
, AnalyticWeights
and ProbabilityWeights
? x-ref: JuliaStats/GLM.jl#487
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkamins weighted residuals, weighted model matrix do not exist in statistics. They are only useful from a coding point of view - they make it easier to write neater code.
I have always defined these quantities as multiplied by
@nalimilan make sense what you propose - I will add more context to the doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if these don't exist in statistics, the question can be phrased as "are there situations where the returned value is useful, even when you don't know the kind of weights used". I think the answer is yes, but it's tricky, so... R base only supports analytic weights so it's not a great reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK let's just adapt the docstring then. Feel free to add more context if you have ideas.
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Something that would be very useful and that I would like to add o this PR is With this method defined (whose implementation for GLM is part of JuliaStats/GLM.jl#487) |
Sure. Would it make sense to call it |
I don't think so. The R package dealing with robust variances uses We already have the inverse of the log hessian of the likelihood in Now, |
I don't have a strong preference, but at least for consistency I think we should spell "likelihood" in full if we use that term. Luckily autocompletion will almost always work. ;-) |
Is there anything left to decide here? |
I think we still need to decide. As of now, these two methods are part of JuliaStats/GLM.jl#487. They would be helpful in extending the methods defined in CovarianceMatrices.jl |
But decide what? :-) |
Whether to merge? |
According to our discussions a few things needed changing AFAICT. |
For linear and generalized linear models, the parameters of interest are the coefficients | ||
of the linear predictor. The moment matrix of a linear model is given by `u.*X`, | ||
where `u` is the vector of residuals and `X` is the model matrix. The moment matrix of | ||
a a generalized linear model with link function `g` is `X'e`, where `e` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a a generalized linear model with link function `g` is `X'e`, where `e` | |
a generalized linear model with link function `g` is `X'e`, where `e` |
Return the residuals of the model or, if `weighted=true`, the residuals multiplied by | ||
the square root of the [model weights](@ref weights(::StatisticalModel)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return the residuals of the model or, if `weighted=true`, the residuals multiplied by | |
the square root of the [model weights](@ref weights(::StatisticalModel)). | |
Return the residuals of the model. The definition may wary depending | |
on the model type. | |
For deviance and Pearson residuals, if `weighted=true`, return | |
residuals multiplied by the square root of the [model weights](@ref weights(::StatisticalModel)). |
struct MyWeightedRegressionModel <: RegressionModel | ||
wts::AbstractVector | ||
end | ||
|
||
StatsAPI.modelmatrix(::MyRegressionModel) = [1 2; 3 4] | ||
|
||
function StatsAPI.modelmatrix(r::MyWeightedRegressionModel; weighted::Bool=false) | ||
X = [1 2; 3 4] | ||
weighted ? sqrt.(r.wts).*X : X | ||
end | ||
|
||
w = [0.3, 0.2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the methods hardcode the matrix, probably not worth having a separate type which doesn't hardcode weights:
struct MyWeightedRegressionModel <: RegressionModel | |
wts::AbstractVector | |
end | |
StatsAPI.modelmatrix(::MyRegressionModel) = [1 2; 3 4] | |
function StatsAPI.modelmatrix(r::MyWeightedRegressionModel; weighted::Bool=false) | |
X = [1 2; 3 4] | |
weighted ? sqrt.(r.wts).*X : X | |
end | |
w = [0.3, 0.2] | |
function StatsAPI.modelmatrix(r::MyRegressionModel; weighted::Bool=false) | |
X = [1 2; 3 4] | |
w = [1.5, 2.0, 0.3, 3.5] | |
weighted ? sqrt.(w).*X : X | |
end |
... and simplify tests below.
residuals(model::RegressionModel; weighted::Bool=false) | ||
|
||
Return the residuals of the model or, if `weighted=true`, the residuals multiplied by | ||
the square root of the [model weights](@ref weights(::StatisticalModel)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK let's just adapt the docstring then. Feel free to add more context if you have ideas.
modelmatrix
has now a keywordweighted=false
which is useful for dealing with weighted models.momentmatrix
- this function is intended to return the matrix of estimating equations; for instance, for a linear model should return u*X, where u is the vector of residuals and X is the model matrix.