Penalization option for likelihood function? #590

dylan-munson · 2024-12-10T15:21:02Z

Hello,

I am having some trouble fitting a model (see my discussion #589 ) and it has been suggested to me that I try penalizing the likelihood function. I know there is a penalized option for the MPLE, but it appears to be broken, I get the error
Error in rep(0, k - 1) : invalid 'times' argument
when I try to pass this argument to ergm(). In any case, it is not clear to me from the documentation if this is the type of penalization option that I would want. Thus I wanted to suggest:

Fixing the issue with the current penalization option, and
Adding more flexibility or at least documentation as to what the penalization is actually doing.

Thank you.

The text was updated successfully, but these errors were encountered:

CarterButts · 2024-12-11T07:27:25Z

If you want something you can try immediately (with the existing code), you might use the penalization scheme from https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0273039. Whether that will help may depend (in addition to other things) on where your original problem lies, and you may need to tweak your computational strategy to take advantage of it. But the basic idea is simple enough: begin with a null model (e.g., a Bernoulli graph with density close to the expected final density); simulate draws from the null model, and calculate your model statistics on these draws (easily done using simulate(), though more efficiently by a combination of rgraph() in the sna library followed by ergm's summary()); take a weighted average of your observed model statistics (from your data) and the mean statistics from your null model; and then fit to this weighted average (using the target.stats argument) instead of your observed statistics. See the paper for details. When the initial model is chosen appropriately, the resulting fit can be interpreted as a MAP estimate under a particular choice of conjugate prior. More broadly, however, this approach can be used as a form of regularization, where regularization is performed with respect to the mean value parameters rather than the natural parameters of the model: you are "shrinking" the model towards the null model, in the mean value space. In principle, this can among other things resolve the problem of having a non-existent MLE due to being on the face of the convex hull of possible statistics. (However, I should note that while even minimal regularization will absolutely resolve that particular problem in theory, one can still encounter numerical problems in practice. These can be resolved with sufficient computational effort , but in difficult cases that effort can be appreciable. So one should bear in mind that such techniques are not magical.) I have included an illustrative example of how to do this, below.

Anyway, since this only requires setting the target.stats argument, it is an easy and principled option that you can try immediately. Hope that helps!

Here's an example, using faux.mesa.high, and showing how one can use this as a regularizer when the MLE does not exist. Bear in mind that if you want to interpret this as MAP, you should not base your prior density on the data (though one could e.g. base it on related data sets). I put a fair amount of weight on the prior, in this case, which you may not necessarily want to do in practice. See the above paper for some discussion on that.

#Example of conjugate MAP/regularization with ergm

#Load prereqs
library(ergm)

#Load the data and define the model formula and such
data(faux.mesa.high)
mod<- faux.mesa.high ~ edges + nodematch("Grade",diff=TRUE) + 
    nodematch("Race",diff=TRUE) + nodematch("Sex") + gwesp(0.25,fixed=TRUE)
prior.ss <- 1       #Weight on the prior (in "graph units")
prior.md <- 2       #Prior mean degree
prior.draws <- 1000 #Sample size to estimate prior mean stats
obs.ss <- summary(mod)

#Get the prior stats
nv <- network.size(faux.mesa.high)
prior.stats <- simulate(mod, coef=c(log(prior.md/(nv-prior.md-1)), 
    rep(0,length(obs.ss)-1)), nsim=prior.draws, output="stats",
    control=control.simulate.formula(MCMC.interval=10*nv^2))
prior.ms <- colMeans(prior.stats)

#Fit the model
target.ss <- (obs.ss + prior.ms)/(1+prior.ss)  #Get the modified target stats
mple <- coef(ergmMPLE(mod, output="fit")) #Get the raw MPLE
mple[!is.finite(mple)] <- 0               #Remove non-existent parts
fit <- ergm(mod, target.stats=target.ss, control=control.ergm(init=mple,
    MCMC.interval=nv^2, main.method="Stochastic"), eval.loglik=FALSE)
fit$covar <- fit$covar/(1+prior.ss)  #Need to adjust for effective df
summary(fit)

handcock · 2024-12-11T09:15:48Z

Hi Dylan,

An alternative is to try fitting a tapered ERGM model. This has a similar "penalized" aspect to it, although the tapered ERGM is a different model than the plain ERGM. That is, the tapered ERGM will fit because the model is less prone to degeneracy rather than come up with an alternative estimator of the plain ERGM. The tapered ERGM would be less compelling if you had strong theoretical belief in the plain ERGM over the tapered version However, that is rare.

For a description see https://github.com/statnet/ergm.tapered.

This may fix the original issue of credible simulation. To try it on Carter's alternative you can just try fit.tapered <- ergm.tapered(mod). The comparison of the two fits and simulations would be instructive.

Best,

Mark

dylan-munson · 2024-12-11T14:12:46Z

Thank you both for these suggestions! I will look into them if the issue persists despite some tweaks to model specification and provide updates if necessary.

dylan-munson · 2024-12-11T15:18:29Z

@handcock I have been looking into the option of using a tapered ERGM and have a question about the returned parameter values. My ultimate goal is to use these parameter values to simulate complete networks from the partial network data I have collected; your paper on tapered ERGMs points out that the interpretation of the coefficients for a tapered ERGM are slightly different. Is it still possible to simulate networks from the coefficient values on the tapered ERGM terms directly, or does some kind of adjustment need to be made/the tapering taken into account when the simulations are generated? Thank you.

handcock · 2024-12-11T20:29:58Z

Hi Dylan,

The simulate command will work as usual, as the model is (ultimately) an ERGM. It will just simulate from the tapered ERGM model.

Separately, if you have partial network data you will need to specify the tapering centers, and these will need to be estimated from the partial data. We have not extended the code to do this automatically, but is a useful thing to do.

dylan-munson · 2024-12-11T20:46:16Z

@handcock Thanks, I think I understand. So basically it is similar to what I would do with a regular ERGM in the sense that I would:

Generate the starting full network with the correct sample size.
Generate a tapered ERGM fit object from this network.
Fill in the tapered ERGM object from (2) with the correct coefficients AND tapering centers, as well as, I assume, the tapering coefficients from the partial data (am I right on this part)?
Simulate the full network using the object filled in (3).

Am I on track here?

handcock · 2024-12-11T22:00:05Z

If you are fitting complete/full networks, then you can skip step 3. It is there if you fit the tapered ERGM to an incomplete network (directly).

CarterButts · 2024-12-11T23:52:10Z

@handcock I have been looking into the option of using a tapered ERGM and have a question about the returned parameter values. My ultimate goal is to use these parameter values to simulate complete networks from the partial network data I have collected; your paper on tapered ERGMs points out that the interpretation of the coefficients for a tapered ERGM are slightly different. Is it still possible to simulate networks from the coefficient values on the tapered ERGM terms directly, or does some kind of adjustment need to be made/the tapering taken into account when the simulations are generated? Thank you.

One thing to be aware of is that a tapered model assumes the existence of a "restoring force" that tends to drive networks towards the tapering center (like a spring). If you are doing extrapolative simulation, and you think that this is a reasonable interpretation (say, e.g., you have an organizational population where highly deviant structures are likely to be non-functional, and it is plausible that the networks are being actively driven to keep certain properties), then simulating with that assumption may be entirely reasonable. But if not, then relying on that restoring force to patch up an otherwise bad specification may lead to misleading results. That's not a criticism of tapered models - it's just that any model you use is based on certain types of assumptions, and both interpretation of the model and use in extrapolation are sensitive to those assumptions. So be sure that whatever model you use (tapered, non-tapered, etc.) makes some sense in your target application before trusting the networks that come out of it.

CarterButts · 2024-12-12T00:01:44Z

PS. I went and looked at the other thread, and left a comment there. It sounds to me like your underlying issue here may well be bad data. If so, patching that up with penalty functions, tapering, or whatnot is not going to help you (and could even produce misleading results). It seems to me that you probably need to revisit your data before proceeding further with analysis.

dylan-munson · 2024-12-12T14:10:15Z

@handcock Thanks very much for the clarification.

@CarterButts I responded to your post on the other thread, thanks as well. The TLDR is that I actually don't think the isolates are a major issue but it is possible there are other problems with the data I need to work out first. However, given my conversations with other people that have worked with this type of data before, I don't think these issues should significantly effect estimation. I do also appreciate the suggestion though that I should look more into the assumptions behind the tapered ERGM and make sure they are clear before proceeding further.

CarterButts · 2024-12-12T23:20:22Z

@dylan-munson See my response in the other thread. Your description suggests to me that you have missing data and/or data collection constraints that are not being passed to the model. If so, that is definitely something that is a known source of problems like the ones you are experiencing, so you'd want to take care of that first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Penalization option for likelihood function? #590

Penalization option for likelihood function? #590

dylan-munson commented Dec 10, 2024

CarterButts commented Dec 11, 2024

handcock commented Dec 11, 2024

dylan-munson commented Dec 11, 2024

dylan-munson commented Dec 11, 2024

handcock commented Dec 11, 2024

dylan-munson commented Dec 11, 2024

handcock commented Dec 11, 2024

CarterButts commented Dec 11, 2024

CarterButts commented Dec 12, 2024

dylan-munson commented Dec 12, 2024

CarterButts commented Dec 12, 2024

Penalization option for likelihood function? #590

Penalization option for likelihood function? #590

Comments

dylan-munson commented Dec 10, 2024

CarterButts commented Dec 11, 2024

handcock commented Dec 11, 2024

dylan-munson commented Dec 11, 2024

dylan-munson commented Dec 11, 2024

handcock commented Dec 11, 2024

dylan-munson commented Dec 11, 2024

handcock commented Dec 11, 2024

CarterButts commented Dec 11, 2024

CarterButts commented Dec 12, 2024

dylan-munson commented Dec 12, 2024

CarterButts commented Dec 12, 2024