The cyclesampler
R-package is an implementation of property-preserving Markov Chain Monte Carlo method for generating surrogate networks in which (i) edge weights are constrained to an interval and vertex weights are preserved exactly, and (ii) edge and vertex weights are both constrained to intervals.
Please see the following article for details and discussion of the algorithm:
Kai Puolamäki, Andreas Henelius, Antti Ukkonen. Randomization algorithms for large sparse networks. Physical Review E 99, 053311, 2019. https://doi.org/10.1103/PhysRevE.99.053311
To install this R-package, proceed as follows.
First install the devtools
-package and load it in R:
install.packages("devtools")
library(devtools)
Then install the cyclesampler
package
install_github("edahelsinki/cyclesampler")
After installation, start R and load the package using
library(cyclesampler)
Here is a brief example of how to use this R-package to create surrogate networks. Consider the following network, which could describe phone calls between different individuals. Each node is a person and the edge weight between each node represents the total phone call duration (in hours) between the two persons connected by the edge. In this network both edge and node weights are hence bounded by the constraint that no-one can spend more than 24 hours on the phone per day.
We now generate surrogate networks where the node weights are preserved exactly, but where the edge weights can vary on the interval [0, 24].
library(cyclesampler)
library(xtable)
## Define the network
data <- matrix(c(1, 2, 1.5,
1, 3, 5,
1, 6, 7,
2, 3, 4,
3, 4, 3,
4, 5, 8,
4, 6, 6),
byrow = TRUE, ncol = 3)
## Scale edge weights to the interval [0, 1] correponding to the interval [0, 24] hours.
data[, 3] <- data[, 3] / 24
## Set constraints on the edge weights: they must be on the interval [0, 24] hours
a <- rep(0, 7)
b <- rep(1, 7)
## Initialise the CycleSampler
X <- cyclesampler(data, a = a, b = b)
## Run the sampler for 5000 steps and pick a sample, repeat five times
set.seed(42)
res <- replicate(5, {X$samplecycles2(5000) ; X$getstate()})
## Table with edge weights
w_e <- cbind(data[,3], res) * 24
colnames(w_e) <- c("original", paste0("sample ", seq.int(5)))
rownames(w_e) <- paste0("edge ", paste0(data[, 1], "-", data[, 2]))
print(xtable(w_e), type = "html")
## Table with node weights
w_n <- apply(w_e, 2, function(i) get_node_weights(cbind(data[, 1:2], i))$W)
rownames(w_n) <- paste0("node ", seq.int(6))
print(xtable(w_n), type = "html")
This gives us the following results for the edge weights, where all edge weights are between 0 and 24 hours, as required:
original | sample 1 | sample 2 | sample 3 | sample 4 | sample 5 | |
---|---|---|---|---|---|---|
edge 1-2 | 1.50 | 1.50 | 1.50 | 1.50 | 1.50 | 1.50 |
edge 1-3 | 5.00 | 2.82 | 0.15 | 5.22 | 4.89 | 4.12 |
edge 1-6 | 7.00 | 9.18 | 11.85 | 6.78 | 7.11 | 7.88 |
edge 2-3 | 4.00 | 4.00 | 4.00 | 4.00 | 4.00 | 4.00 |
edge 3-4 | 3.00 | 5.18 | 7.85 | 2.78 | 3.11 | 3.88 |
edge 4-5 | 8.00 | 8.00 | 8.00 | 8.00 | 8.00 | 8.00 |
edge 4-6 | 6.00 | 3.82 | 1.15 | 6.22 | 5.89 | 5.12 |
We also observe that the node weights are preserved exactly for all nodes:
original | sample 1 | sample 2 | sample 3 | sample 4 | sample 5 | |
---|---|---|---|---|---|---|
node 1 | 13.50 | 13.50 | 13.50 | 13.50 | 13.50 | 13.50 |
node 2 | 5.50 | 5.50 | 5.50 | 5.50 | 5.50 | 5.50 |
node 3 | 12.00 | 12.00 | 12.00 | 12.00 | 12.00 | 12.00 |
node 4 | 17.00 | 17.00 | 17.00 | 17.00 | 17.00 | 17.00 |
node 5 | 8.00 | 8.00 | 8.00 | 8.00 | 8.00 | 8.00 |
node 6 | 13.00 | 13.00 | 13.00 | 13.00 | 13.00 | 13.00 |
The cyclesampler
R-package is licensed under the MIT-license.