Projection for discrete exponential dispersion (DED) families #361

fweber144 · 2022-10-23T19:41:21Z

The families currently supported by projpred (gaussian(), binomial(), brms::bernoulli(), poisson()) belong to the class of "exponential dispersion (ED) families" (Jørgensen, 1987; see also section "Remarks" below) used in GLMs (see McCullagh and Nelder, 1989, chapter 2, equation (2.4) at the beginning of section 2.2.2, p. 28) (and also used in GLMMs, GAMs, and GAMMs). As shown by Piironen et al. (2020, section 3.4), the projection onto an ED-family submodel can be performed quite easily by fitting to the fit of the reference model, at least when regarding only the projection onto the location-specific parameter vector and disregarding the projection onto the dispersion parameter.

Jørgensen (1987) also presents the class of "discrete exponential dispersion families" (hereafter abbreviated by "DED families"). For example, the negative binomial distribution is a DED family. DED families are closely related to ED families (Jørgensen, 1987), but strictly speaking, the class of DED families is not a subset of the class of ED families, as can be seen from comparing equations (1.1) and (2.12) of Jørgensen (1987). However, the projection of a reference model onto the location-specific parameters of a DED-family submodel can be performed just as easily as for an ED-family submodel because for DED families, equation (12) of Piironen et al. (2020) simplifies to

$\theta_{c} = \mathop{\mathrm{argmax}}\limits_{\theta \in \Theta} \sum\limits_{i = 1}^{N} \left((\mu^{*}_{c, i} \, \mu_i(\dot{\theta}) - K(\phi) B(\mu_i(\dot{\theta}))) + \mathbb{E}(H(\tilde{y}_i, \phi)|I_c)\right)$

where I have adapted the notation a bit:

$\theta_{c}$ corresponds to $\theta_{\perp}$ from Piironen et al. (2020).
I have split up $\theta$ into the location-specific parameter vector $\dot{\theta} \in \dot{\Theta}$ and the dispersion parameter $\phi \in (0, \infty)$, i.e., $\theta = (\dot{\theta}^{\scriptscriptstyle \mathsf{T}}, \phi)^{\scriptscriptstyle \mathsf{T}}$ and accordingly $\Theta = \{\theta = (\dot{\theta}^{\scriptscriptstyle \mathsf{T}}, \phi)^{\scriptscriptstyle \mathsf{T}}: \dot{\theta} \in \dot{\Theta}, \phi \in (0, \infty)\}$.
$N$ corresponds to $n$ from Piironen et al. (2020). (I picked $N$ to agree with the notation from projpred's documentation.)
$\mu^{*}_{c, i}$ corresponds to $\mu^{*}_i$ from Piironen et al. (2020).
$\mu_i(\cdot)$ corresponds to $\eta_i(\cdot)$ from Piironen et al. (2020). (I picked $\mu$ instead of $\eta$ to avoid confusion with the linear or latent predictor.)
$\mathbb{E}(\cdot|I_c)$ corresponds to $\mathbb{E}_{\tilde{y}_i | I_c}(\cdot)$ from Piironen et al. (2020). (The GitHub Markdown math mode has some issues with the latter notation when used above.)
$K(\phi)$ corresponds to $\lambda$ from Jørgensen (1987). (I chose this functional notation in analogy to $A(\phi)$ from Piironen et al. (2020).)

When calculating the gradient of the right-hand side expression above (which is to be maximized), one can see that the term $\mathbb{E}(H(\tilde{y}_i, \phi)|I_c)$ is unimportant for the maximization with respect to the location-specific parameter vector $\dot{\theta}$ and so the projection solution with respect to $\dot{\theta}$ is obtained by a maximum-likelihood fit to the fit of the reference model, analogously to what Piironen et al. (2020, section 3.4) derived for ED families. For example, in case of a negative binomial submodel without multilevel and without additive terms, MASS::glm.nb() could be used for calculating the projection solution with respect to $\dot{\theta}$.

For the dispersion parameter, things get more complicated: The projection solution for $\phi$ is given by

$\phi_{c} = \mathop{\mathrm{argmax}}\limits_{\phi \in (0, \infty)} \sum\limits_{i = 1}^{N} \left((\mu^{*}_{c, i} \, \mu_i(\dot{\theta}_c) - K(\phi) B(\mu_i(\dot{\theta}_c))) + \mathbb{E}(H(\tilde{y}_i, \phi)|I_c)\right)$

with $\dot{\theta}_c$ denoting the projection solution with respect to $\dot{\theta}$ calculated as described above (fitting to the fit of the reference model). For the negative binomial distribution, I haven't been able so far to simplify this $\phi_c$ expression to a more tractable one because $\mathbb{E}(H(\tilde{y}_i, \phi)|I_c)$ essentially consists of expectations of log-Gamma function values. But perhaps future work for the negative binomial family could continue here, possibly using an approximation for this more complicated $\mathbb{E}(H(\tilde{y}_i, \phi)|I_c)$ part.

Remarks

Jørgensen (1987) himself only uses the term "exponential dispersion models", but the discussion for that article mentions the term "exponential dispersion (ED) families", so I think it should be fine to use the term "ED families" also in the context of projpred.
With regard to the GLM definition from McCullagh and Nelder (1989, chapter 2) that is also cited by Piironen et al. (2020), it is not really correct to say that "GLMs have their observation model in the exponential family" (Piironen et al., 2020, section 3.5) because the class of exponential families is different from the class of ED families: For a known dispersion parameter, ED families are indeed (special, i.e., linear, see Jørgensen, 1987, section 2) exponential families, but in general, this is not the case if the dispersion parameter is unknown (McCullagh and Nelder, 1989, chapter 2). I admit, however, that the wording of McCullagh and Nelder (1989, chapter 2, beginning of section 2.2.2, p. 28) is mistakable. Others seem to have been confused by this as well, see, e.g., here and here.

References

Jørgensen, B. (1987). Exponential Dispersion Models. Journal of the Royal Statistical Society. Series B (Methodological), 49(2), 127–162.

McCullagh, P., & Nelder, J. A. (1989). Generalized Linear Models (D. R. Cox, D. V. Hinkley, N. Reid, D. B. Rubin, & B. W. Silverman, Eds.; 2nd ed., Vol. 37). Chapman & Hall.

Piironen, J., Paasiniemi, M., & Vehtari, A. (2020). Projective inference in high-dimensional problems: Prediction and feature selection. Electronic Journal of Statistics, 14(1), 2155–2197. https://doi.org/10.1214/20-EJS1711

EDIT: In remark 2, I emphasized that I'm referring to the GLM definition from McCullagh and Nelder (1989, chapter 2). In general, GLMs can have different definitions.

The text was updated successfully, but these errors were encountered:

avehtari · 2022-10-28T11:15:28Z

For the negative binomial distribution, I haven't been able so far to simplify this
expression to a more tractable

numerical solution would be fine, too

It is not really correct to say that "GLMs have their observation model in the exponential family" (Piironen et al., 2020, section 3.5) because the class of exponential families is different from the class of ED families:

I think this depends on the definition of GLM. If I remember correctly, at some point, GLM was defined to mean this.

fweber144 · 2022-10-28T13:12:10Z

numerical solution would be fine, too

Ok, I will have to think about a possible numerical solution when I get time for this. Currently, I would focus on the latent projection and the augmented-data projection first.

I think this depends on the definition of GLM.

Yes, sorry, I forgot to mention that. I was following Piironen et al. (2020) and assumed that GLMs should be defined as in McCullagh and Nelder (1989, chapter 2). I don't know if Google Books lets you inspect page 28 of McCullagh and Nelder (1989) (for me, it does) or if you have access to an e-book or a physical copy of it (I took a physical copy). Then on top of page 28, you'll see what I mean by "mistakable wording":

We assume that [...] has a distribution in the exponential family, taking the form [... (basically equation (14) of Piironen et al., 2020)] for some specific [...]. If $\phi$ is known, this is an exponential-family model [...]. It may or may not be a two-parameter exponential family if $\phi$ is unknown.

avehtari · 2022-10-28T13:28:41Z

I have McCullagh and Nelder (1989) and the definition is what I remembered. I'm now confused what is the problem in Piironen et al (2020).

fweber144 · 2022-10-28T13:39:30Z

In general, equation (14) of Piironen et al. (2020) (i.e., equation (2.4) of McCullagh and Nelder, 1989) is not an exponential family. Practically, this is (currently) not relevant because the R families supported by projpred right now (gaussian(), binomial(), poisson()) are all exponential dispersion families and exponential families.

fweber144 · 2022-10-28T13:42:47Z

Background of this: I wanted to formulate the projection mathematically and then was confused by the term "exponential family" used for equation (14) of Piironen et al. (2020).

fweber144 added the feature label Oct 23, 2022

fweber144 mentioned this issue Oct 26, 2022

Implement negative binomial #67

Closed

fweber144 added enhancement Enhancements of existing features, but also new feature requests. and removed feature labels Oct 26, 2022

lauterbur mentioned this issue May 24, 2023

Model fit with custom distribution family? #418

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Projection for discrete exponential dispersion (DED) families #361

Projection for discrete exponential dispersion (DED) families #361

fweber144 commented Oct 23, 2022 •

edited

Loading

avehtari commented Oct 28, 2022

fweber144 commented Oct 28, 2022

avehtari commented Oct 28, 2022

fweber144 commented Oct 28, 2022

fweber144 commented Oct 28, 2022

Projection for discrete exponential dispersion (DED) families #361

Projection for discrete exponential dispersion (DED) families #361

Comments

fweber144 commented Oct 23, 2022 • edited Loading

Remarks

References

avehtari commented Oct 28, 2022

fweber144 commented Oct 28, 2022

avehtari commented Oct 28, 2022

fweber144 commented Oct 28, 2022

fweber144 commented Oct 28, 2022

fweber144 commented Oct 23, 2022 •

edited

Loading