-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Projection for discrete exponential dispersion (DED) families #361
Comments
numerical solution would be fine, too
I think this depends on the definition of GLM. If I remember correctly, at some point, GLM was defined to mean this. |
Ok, I will have to think about a possible numerical solution when I get time for this. Currently, I would focus on the latent projection and the augmented-data projection first.
Yes, sorry, I forgot to mention that. I was following Piironen et al. (2020) and assumed that GLMs should be defined as in McCullagh and Nelder (1989, chapter 2). I don't know if Google Books lets you inspect page 28 of McCullagh and Nelder (1989) (for me, it does) or if you have access to an e-book or a physical copy of it (I took a physical copy). Then on top of page 28, you'll see what I mean by "mistakable wording":
|
I have McCullagh and Nelder (1989) and the definition is what I remembered. I'm now confused what is the problem in Piironen et al (2020). |
In general, equation (14) of Piironen et al. (2020) (i.e., equation (2.4) of McCullagh and Nelder, 1989) is not an exponential family. Practically, this is (currently) not relevant because the R families supported by projpred right now ( |
Background of this: I wanted to formulate the projection mathematically and then was confused by the term "exponential family" used for equation (14) of Piironen et al. (2020). |
The families currently supported by projpred (
gaussian()
,binomial()
,brms::bernoulli()
,poisson()
) belong to the class of "exponential dispersion (ED) families" (Jørgensen, 1987; see also section "Remarks" below) used in GLMs (see McCullagh and Nelder, 1989, chapter 2, equation (2.4) at the beginning of section 2.2.2, p. 28) (and also used in GLMMs, GAMs, and GAMMs). As shown by Piironen et al. (2020, section 3.4), the projection onto an ED-family submodel can be performed quite easily by fitting to the fit of the reference model, at least when regarding only the projection onto the location-specific parameter vector and disregarding the projection onto the dispersion parameter.Jørgensen (1987) also presents the class of "discrete exponential dispersion families" (hereafter abbreviated by "DED families"). For example, the negative binomial distribution is a DED family. DED families are closely related to ED families (Jørgensen, 1987), but strictly speaking, the class of DED families is not a subset of the class of ED families, as can be seen from comparing equations (1.1) and (2.12) of Jørgensen (1987). However, the projection of a reference model onto the location-specific parameters of a DED-family submodel can be performed just as easily as for an ED-family submodel because for DED families, equation (12) of Piironen et al. (2020) simplifies to
where I have adapted the notation a bit:
When calculating the gradient of the right-hand side expression above (which is to be maximized), one can see that the term$\mathbb{E}(H(\tilde{y}_i, \phi)|I_c)$ is unimportant for the maximization with respect to the location-specific parameter vector $\dot{\theta}$ and so the projection solution with respect to $\dot{\theta}$ is obtained by a maximum-likelihood fit to the fit of the reference model, analogously to what Piironen et al. (2020, section 3.4) derived for ED families. For example, in case of a negative binomial submodel without multilevel and without additive terms, $\dot{\theta}$ .
MASS::glm.nb()
could be used for calculating the projection solution with respect toFor the dispersion parameter, things get more complicated: The projection solution for$\phi$ is given by
with$\dot{\theta}_c$ denoting the projection solution with respect to $\dot{\theta}$ calculated as described above (fitting to the fit of the reference model). For the negative binomial distribution, I haven't been able so far to simplify this $\phi_c$ expression to a more tractable one because $\mathbb{E}(H(\tilde{y}_i, \phi)|I_c)$ essentially consists of expectations of log-Gamma function values. But perhaps future work for the negative binomial family could continue here, possibly using an approximation for this more complicated $\mathbb{E}(H(\tilde{y}_i, \phi)|I_c)$ part.
Remarks
References
Jørgensen, B. (1987). Exponential Dispersion Models. Journal of the Royal Statistical Society. Series B (Methodological), 49(2), 127–162.
McCullagh, P., & Nelder, J. A. (1989). Generalized Linear Models (D. R. Cox, D. V. Hinkley, N. Reid, D. B. Rubin, & B. W. Silverman, Eds.; 2nd ed., Vol. 37). Chapman & Hall.
Piironen, J., Paasiniemi, M., & Vehtari, A. (2020). Projective inference in high-dimensional problems: Prediction and feature selection. Electronic Journal of Statistics, 14(1), 2155–2197. https://doi.org/10.1214/20-EJS1711
EDIT: In remark 2, I emphasized that I'm referring to the GLM definition from McCullagh and Nelder (1989, chapter 2). In general, GLMs can have different definitions.
The text was updated successfully, but these errors were encountered: