Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disconnected communities with ModularityVertexPartition on very large graphs #175

Open
wolfram77 opened this issue Mar 21, 2024 · 3 comments
Assignees

Comments

@wolfram77
Copy link

With the webbase-2001 graph, a graph with 118M vertices and and 1.02B directed edges, leidenalg is currently returning 30 disconnected communities (with seed 0).

Vincent notes that this problem is likely due to numerical precision. That is, the ModularityVertexPartition is using scaled improvements to modularity (i.e. 1/m, for consistency with the quality function), which for very large graphs may amount to not seeing any difference (i.e. the improvement of separating two disconnected parts may be positive, but due to the enormous weight, this may effectively be (near) 0). Instead, the RBConfigurationVertexPartition uses unscaled improvements to modularity (i.e. they do not scale with the total weight).

We do not observe any disconnected communities with both RBConfigurationVertexPartition (with libleidenalg) and igraph.

A possible fix may be to switch to double for all computation. Another solution might be for ModularityVertexPartition to extend RBConfigurationVertexPartition and simply overload the quality function (so that quality returns modularity, but diff_move is still not scaled).

@szhorvat
Copy link
Contributor

@vtraag, does this affect the implementation in igraph as well? I'm guessing yes?

@vtraag
Copy link
Owner

vtraag commented Mar 21, 2024

Thanks for the report here @wolfram77!

The problem here is that I want to ensure that the diff_move() function is consistent with the difference in the quality() function after the exact same move. Indeed, for the ModularityVertexPartition, the diff_move() is scaled (i.e. with $1/m$ with $m$ the number of edges) in order to be consistent with the quality() function.

What I'll probably do is separate the diff_move function into two separate parts, one diff_move_raw, which can then be used in the optimisation procedure, and one diff_move which is the scaled version of diff_move_raw used for consistency checks.

@vtraag, does this affect the implementation in igraph as well? I'm guessing yes?

No, this does not affect the igraph implementation, it always uses an unscaled version of the difference (i.e. it doesn't divide by $m$).

@vtraag vtraag self-assigned this Mar 21, 2024
@wolfram77
Copy link
Author

Hello @vtraag I was thinking if it would be easier to recommend users to use RBConfigurationVertexPartition instead of ModularityVertexPartition for very large graphs - until you see enough requests for it to directly work with ModularityVertexPartition. If so, we can update the documentation and close this issue immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants