01-ergm_migration.Rmd

---
title: "Exploring Brazil data"
author: "Juan Rocha"
date: "May 2022"
output:
    html_document:
      theme:
        bootswatch: cosmo
        code_font:
            google: Fira Code
      df_print: paged
      code_folding: hide
      toc: true
      toc_float:
        collapsed: true
        smooth_control: true
      toc_depth: 3
      fig_caption: true
      highlight: pygments
      self_contained: false
      lib_dir: libs
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, eval = FALSE)
library(tidyverse)
library(fs)
```

Load libraries:
```{r warning=FALSE, message=FALSE, eval = TRUE}
library(tidyverse)
library(fs)
library(network)
library(ergm)
library(ergm.count)
library(tictoc)
```


## Data cleaning

I have pre-cleaned and compiled the data into the network and attributes datasets. The following chunks are not evaluated, but showcase how the data was cleaned. 

```{r message = FALSE, warning = FALSE}
fls <- dir_ls("data/monthly")
fls
fls <- fls |> str_subset("2017") |> str_subset("idp")
fls
out <- map(fls, read_csv, skip = 0, col_names = 1:74)
class(out)
length(out)
```

Create adjacency matrix
```{r}
out <- map(out, function(x) as.matrix(x))

mat <- out[[1]] + out[[2]] + out[[3]] + out[[4]] + out[[5]] + out[[6]] + out[[7]] + out[[8]] + out[[9]] + out[[10]] + out[[11]] + out[[12]]
```

Process network attributes:
```{r message = FALSE, warning=FALSE}
fls <- dir_ls("data/monthly")

fls <- fls |> str_subset("2017") |> str_subset("var")

out <- map(fls, read_csv)
out <- map2(.x = out, .y = 1:12, function(x,y) {x$month <- y; return(x)})

out <-bind_rows(out)
out
```

```{r}
dat <- out |> group_by(id) |>
  summarize(
    fatality = sum(fatality),
    conflict_occ = sum(conflict_occ),
    spi_m = mean(spi),
    spi_max = max(spi),
    spi_min = min(spi),
    pop = mean(Pop2014UNFPA),
    urban = mean(Urban2014UNFPA),
    rural = mean(Rural2014UNFPA)
  )


dat
```


Create network:
```{r}
net <- network(mat, directed = TRUE)
net

```

Add attributes:

```{r}
net %e% "flow" <- mat
net %v% "fatality" <- dat$fatality
net %v% "conflict" <- dat$conflict_occ
net %v% "spi_m" <- dat$spi_m
net %v% "spi_max" <- dat$spi_max
net %v% "spi_min" <- dat$spi_min
net %v% "pop" <- dat$pop
net %v% "urban" <- dat$urban
net %v% "rural" <- dat$rural

plot(net)
```

Observed distribution of triangles

```{r}

summary(net~triadcensus)
## check
#??sna::triad.classify
```

## ERGMs

Same for the ergms, they were pre-computed to save computing time. The code is only to showcase what was done. 

```{r eval = TRUE}
load("data/simple_ergms.RData")
```


### Non-weighted

```{r }
tic()
f0 <- ergm(net ~ edges)
toc() # 0.15s
```

```{r}
tic()
f1 <- ergm(net ~ edges + diff("spi_m") +diff("spi_max")+diff("spi_min")+ 
               diff("conflict") + diff("fatality") + nodecov("pop") + 
               nodecov("conflict") + nodecov("fatality") + nodecov("spi_m") + 
               nodecov("spi_max") + nodecov("spi_min"))
toc() # 0.29s
```

```{r}
tic()
f2 <- ergm(net ~ edges + diff("spi_m") + diff("spi_max") +
    diff("spi_min") + diff("conflict") + diff("fatality") + nodecov("pop") +
    nodecov("conflict") + nodecov("fatality") + nodecov("spi_m") +
    nodecov("spi_max") + nodecov("spi_min") + transitiveties)
toc() # 38s
```

```{r eval = TRUE}
summary(f0)
summary(f1)
summary(f2)
```

## Structural ERGMs

```{r}
tic()
e1 <- ergm(net ~ edges + transitiveties)
toc() #10s

```


```{r}
tic()
e2 <- ergm(net ~ edges + intransitive) #gwnsp(cutoff = 10) + gwesp(cutoff = 15)
toc() # 6s
```


```{r}
tic()
e3 <- ergm(net ~ edges + gwesp(cutoff = 15),
           constraints = ~degreedist,
           control = snctrl(parallel = 3)
           ) #gwnsp(cutoff = 10) + gwesp(cutoff = 15)
toc() # 17s in parallel
```

```{r eval = TRUE}
summary(e1)
summary(e2)
summary(e3)
```


## Weighted ERGMs

```{r}
tic()
w0 <- ergm(net ~ nonzero + sum ,
           response = "flow",
           reference = ~Poisson,
           #constraints = ~degreedist,
           control = snctrl(parallel = 3))
toc() # 358s
```


```{r eval = TRUE}
summary(w0)
```


```{r}
tic()
w1 <- ergm(net ~ nonzero + sum + diff("spi_m") + diff("spi_max") +
    diff("spi_min") + diff("conflict") + diff("fatality") + nodecov("pop") +
    nodecov("conflict") + nodecov("fatality") + nodecov("spi_m") +
    nodecov("spi_max") + nodecov("spi_min"),
           response = "flow",
           reference = ~Poisson,
           #constraints = ~degreedist,
           control = snctrl(parallel = 3))
toc() # 628s
```

```{r eval = TRUE}
summary(w1)
```

Takes a while and the models are degenerated

```{r eval = TRUE}
mcmc.diagnostics(w1)
```


Save work space:

```{r}
# save(e1, e2, e3, f0, f1, f2, w0, w1, net, file = "data/simple_ergms.RData")
```


## Combined

```{r eval=TRUE, results='asis'}
stargazer::stargazer(
    f0,f1,f2,e1,e2,e3,w0,w1,
    type = "html", digits = 2
)
```


## Next steps

- Do we need weighted ergms?
- What other terms are worth exploring? 
    - triangles are computationally expensive, geometrically weighted terms are possible but time consuming and only converging when constrained.
    - What are sensible choices for constraining the sample space?