Edge count consistently underrepresented in dense network #589
Replies: 2 comments 2 replies
-
You may need to consider some of the component/connectivity statistics from |
Beta Was this translation helpful? Give feedback.
-
@CarterButts Thanks very much for your suggestions. It probably is worthwhile for me to go back and check the data again in any case. I guess I should clarify that I actually do believe the isolates are meaningful and theoretically important, in the sense that during data collection, they were people named by respondents as contacts but who were not listed as being people from whom the respondents received information. Thus I leave them in as "potential" connections in the network who happen to not be connected to the respondent. That said, I do think it is possible that in the main connected component of the graph, there is some kind of upward bias in the number of edges because of the way the data was collected. If anything this would be a bigger concern for me than the isolates (especially because removing the isolates seems to have no effect on estimation). If this is the case, I'm not sure how I could deal with this type of situation or test for it, but if you have any suggestions I would much appreciate hearing them. |
Beta Was this translation helpful? Give feedback.
-
Hello,
I am trying to fit an ERGM to some network data that I have collected (I may be able to share a deidentified version of the data upon request). The raw network is fairly dense, with 344 nodes and 727 edges, but also with a good amount of isolates and a fairly skewed degree distribution (undirected, range = 0-35). There is also a very high number of closed triangles in the data (the triad census gives 555 300-type triangles). The ERGM I am trying to fit has some basic demographic main and homophily effects as well as a GWESP term.
However, the networks that are simulated during the fitting procedure consistently, even at their maximum, have lower edge counts than the observed network. I also tried simulating a network myself and the closed triangle count was also much lower than in the observed data. I have tried toying with the decay parameter in the GWESP term to no avail (raising it too high leads to the opposite problem of model degeneracy, in fact). I have also tried adding terms for isolates and geometrically weighted degree distribution to see if this helps, but in these cases the model usually does not converge or the addition does not help either. I am somewhat at a loss as I haven't been able to find much about people encountering this kind of problem before. Any advice would be greatly appreciated and, as I said, I may be able to provide some of the data if anyone would like to have a further look.
EDIT: I forgot to mention that this network also has a fair number of isolates because of the way that the network data was collected. One reason I am thinking I may be running into this issue is that there are two forces pulling the estimation in opposite directions that end up "meeting in the middle" in an unsatisfactory way--the large number of zero-degree nodes on the one hand and the large number of closed triads amongst the higher-degree nodes on the other. Short of just removing the isolates, is there perhaps some way I can try to balance out these forces more appropriately? I have also added a term in the model now to control for the degree of each node (a nodecov("degree") term). It may also be worth noting that the model seems to have convergence issues in general, with the iterations beginning very far away from the tolerance region in absolute terms and often straying further away again even as the iterations continue.
Beta Was this translation helpful? Give feedback.
All reactions