Consider calling glm.fit() instead of glm() #569

mbojan · 2024-07-04T18:49:57Z

In

ergm/R/ergm.mple.R

Lines 100 to 101 in 1f4401e

    
           glm.result <- quietly(function() glm(pl$zy ~ .-1 + offset(pl$foffset), 
        
                                                data=data.frame(pl$xmat), weights=pl$wend, family=family))()

consider calling glm.fit() directly rather than glm(). Experiments with biggish data show that it might cut the computing time by half.

The text was updated successfully, but these errors were encountered:

…l Bojanowski). fixes #569

krivit · 2024-08-18T04:28:31Z

@mbojan , can you test to see if this works?

krivit · 2024-09-23T07:44:04Z

@mbojan , I'll be submitting an update to ergm in the next few days. If you want this to go in, let me know ASAP.

mbojan · 2024-09-23T09:28:47Z

Oh dear, I have a week of workshops, including ergms. Can we release next week? The principle answer is yes, but haven't tested yet.

krivit · 2024-09-28T23:44:28Z

Oh dear, I have a week of workshops, including ergms. Can we release next week? The principle answer is yes, but haven't tested yet.

OK, can you get it done in the next day or two?

krivit · 2024-10-02T08:48:19Z

@mbojan ?

krivit · 2024-10-03T00:59:01Z

@AdrienLeGuillou , you often fit MPLE to large networks, right? Can you by any chance test this?

AdrienLeGuillou · 2024-10-03T06:50:39Z

I just ran a quick test on a smaller 10k nodes network using this branch. It worked fine. I can't tell if it was faster or not as I usually work with "Stochastic-Approximation" on these smaller local tests.
I can try on the HPC with our 3 - 100k nodes networks and compare the time it takes.

AdrienLeGuillou · 2024-10-03T07:49:02Z

I just realized that ergm.mple is called whatever the main.method we use.
Therefore I can confirm that it works perfectly on our 10k nodes networks.
It takes a very similar amount of time to fit the networks with both version as the MPLE step is not the longest part anyways.

AdrienLeGuillou · 2024-10-03T09:57:48Z

I confirm it also works on the 100k nodes network.
It was actually longer with glm.fit. But the difference was on the number of MCMLE iterations.

mbojan · 2024-10-03T11:35:07Z

Thanks @AdrienLeGuillou . @krivit don't merge, leave as is. I need to dig out the script where I think I noticed the difference.

krivit · 2025-02-16T06:30:01Z

I confirm it also works on the 100k nodes network. It was actually longer with glm.fit. But the difference was on the number of MCMLE iterations.

A few things to try:

Run set.seed(0) (or some other number) before the ergm() call. I don't think the GLM code has any stochastic elements, so which variant is used shouldn't make a difference.
Run Rprof() before running the test code, Rprof(NULL) after; then summaryRprof() should tell you how much time is being spent in ergm.mple().

AdrienLeGuillou · 2025-02-18T17:35:04Z

I just run a few tests on the HPC with a 100k nodes with set.seed(0) and Rprof.

I don't see any difference.

            total.time total.pct self.time self.pct
"ergm.mple"        0.4      0.08         0        0

That was the last run with the @i120-glm-fit branch.

Both branches gives total times between 0.4 and 0.6 .

I think that for big networks this is not very important as the overhead of glm instead of glm.fit is quickly surpassed by the actual computation time.

For smaller networks it probably makes a lot more sense.

For reference, this is the formula used for the network:

model_main <- ~ edges +
  nodematch("age.grp", diff = TRUE) +
  nodefactor("age.grp", levels = -1) +
  nodematch("race", diff = FALSE) +
  nodefactor("race", levels = -1) +
  nodefactor("deg.casl", levels = -1) +
  concurrent +
  degrange(from = 3) +
  nodematch("role.class", diff = TRUE, levels = c(1, 2))

mbojan · 2025-02-19T22:46:15Z

@krivit @AdrienLeGuillou Thanks for investigating. I can't find the usecase in which I think noticed that effect. Did you look at the possible effect on memory footprint too? I'd say we can declare this issue as "unconfirmed" and let it rest.

AdrienLeGuillou · 2025-03-07T09:26:46Z

@mbojan glm would use a bit more memory, but nothing relevant compared to the memory used by the data and fitting routines. Either on my machine or on HPC I could not detect a difference with simply htop.

mbojan added Type: Enhancement Language: R Component: Estimation labels Jul 4, 2024

krivit added a commit that referenced this issue Aug 18, 2024

Use glm.fit() in place of glm() wherever possible (suggested by Micha…

02bfcb4

…l Bojanowski). fixes #569

mbojan self-assigned this Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider calling glm.fit() instead of glm() #569

Consider calling glm.fit() instead of glm() #569

mbojan commented Jul 4, 2024

krivit commented Aug 18, 2024

krivit commented Sep 23, 2024

mbojan commented Sep 23, 2024

krivit commented Sep 28, 2024

krivit commented Oct 2, 2024

krivit commented Oct 3, 2024

AdrienLeGuillou commented Oct 3, 2024

AdrienLeGuillou commented Oct 3, 2024

AdrienLeGuillou commented Oct 3, 2024

mbojan commented Oct 3, 2024

krivit commented Feb 16, 2025

AdrienLeGuillou commented Feb 18, 2025

mbojan commented Feb 19, 2025

AdrienLeGuillou commented Mar 7, 2025

Consider calling glm.fit() instead of glm() #569

Consider calling glm.fit() instead of glm() #569

Comments

mbojan commented Jul 4, 2024

krivit commented Aug 18, 2024

krivit commented Sep 23, 2024

mbojan commented Sep 23, 2024

krivit commented Sep 28, 2024

krivit commented Oct 2, 2024

krivit commented Oct 3, 2024

AdrienLeGuillou commented Oct 3, 2024

AdrienLeGuillou commented Oct 3, 2024

AdrienLeGuillou commented Oct 3, 2024

mbojan commented Oct 3, 2024

krivit commented Feb 16, 2025

AdrienLeGuillou commented Feb 18, 2025

mbojan commented Feb 19, 2025

AdrienLeGuillou commented Mar 7, 2025