This repo is a toolbox that serves as an addon to the surveillance package and its hhh4 function for spatio-temporal modeling of disease spread.
In spatio-temporal models, we need to specify the transmission between districts in a weight matrix. Common approaches are based on the adjacency of districts / the neighbourhood structure, which depends on where district boundaries are. The methods provided here are based on different distance measures serve two purposes:
-
Transmission risk is modeled on the individual level instead of the district level to reduce the influence of boundary locations and to unify the definition of transmission risk for intra- and inter-district transmission
-
Detailed population density data is included in hope to improve forecast performance and to enable risk mapping on resolution finer than the observed district level
When modeling the risk of transmitting disease in a spatio-temporal setting, we typically express the risk as a function of a distance measure. For example if we use the order of neighbourhood as distance
This toolbox is designed for applications with spatio-temporal data, i.e. time series of infection counts for different districts. With such a data set the following steps are needed:
The function getJSON
downloads shapefiles with district boudnaries.
The function getPOP
downloads population density data from Worldpop and saves it in a raster file.
Needs to be done by hand.
4) Sampling individuals in each district according to population density #### $\rightarrow$ wspsample.R
The function wspsample
takes a polygon from the shapefile (step 1), a sample size and population density data (step 2) as input and returns a sample of locations where individuals live. The output is a matrix where each row contains coordinates and the population estimate in the corresponding 100m x 100m grid cell. The boolean argument weighted
specifies if the sampling should be weighted by population.
Different distance measures are implemented that each take two matrices of coordinates as input, one from an origin district and one from the destination district. The argument pairwise
specifies if distances shall be computed pairwise, and the argument log
if the distances shall be returned in logs.
Currently implemented are:
-
beeline
: Beeline distance -
travelTime
: The time to travel from one point to another (by car, in minutes) -
gravity
: Like the beeline distance but in addition the ratio of population densities at origin and destination is returned as well -
circle
: Needs additional input. From the population (step 2), create a matrix (as.matrix(pop)
) and an extent object (extent(pop)
). It then computes the population in a circle around the origin that touches the destination. It makes use of the midpoint circle algorithm and C++ for speed-up. -
radiation
: Similar to thecircle
measure$C$ , but also includes population dennsity at origin$P_O$ and destination$P_D$ . It is the inverse of the radiation formula for human migration such that larger$R$ indicates larger distance. $$ R = \frac{(P_o + C)\cdot(P_O + P_D + C)}{P_O\cdot P_D} $$
Using the sample of distances (step 5), a mixture of truncated LogNormal distributions is fitted. For the gravity model, a function fits a mixture of untruncated bivariate Normal distributions. It uses an EM algorithm for estimation and returns the estimates as well as the BIC to determine the number of mixture components.
To use the fitted distributions in a weight function for hhh4
, they must be saved in a list with elements
mu
: Mean parameterssigma
: Standard deviation parametersw
: Weightsl
: Lower bound of distance measure (in logs)u
: Upper bound of distance measure (in logs)pop
: diagonal matrix of district population
The first three elements should be of dimension
For the gravity model, the distance measure is two dimensional, thus we have list elements mu1
and mu2
instead of mu
for mean parameters and
sigma1
, sigma2
and sigma12
instead of sigma
for the elements of the variance covariance matrix.
Finally, we define weights for hhh4
with the function W_fgsim
. It is similar to the W_powerlaw
function in surveillance
and returns a list of functions that are used to compute weights and their derivatives as well as initial values for the decay parameter. Its arguments are
-
pars
: List of parameters (step 7) -
truncPL
: Shall the power law be truncated: Instead of translating distances$D$ to$D^{-d}$ , it would translate to$\min{D^{-d}, \delta^{-d}}$ , where$\delta$ is a threshold parameter that is estimated. -
gravity
: Set to TRUE if the distance measure isgravity
-
maxlag
: The maximum neighbourhood order with non-zero weights -
normalize
: Shall rows of the weight matrix be normalised? -
log
: Are parameters estimated in logs to ensure positivity? -
initial
: Initial values for estimation -
from0
: IfTRUE
, weights are also computed for intra-district transmission -
popScale
: IfTRUE
, the weights are multiplied by the population size of the destination district to account for heterogeneity
- Barbosa H. et al. (2018) Human mobility: Models and applications, Physics Reports, 734:1–74.
- Giraud, T. (2022) osrm: Interface Between R and the OpenStreetMap-Based Routing Service OSRM, Journal of Open Source Software, 7(78), 4574.
- Held L., Höhle M., Hofmann M. (2005) A statistical framework for the analysis of multivariate infectious disease surveillance counts, Statistical Modelling, 5(3):187–199.
- Meyer S., Held L. (2014) Power-law models for infectious disease spread, The Annals of Applied Statistics, 8(3), 1612–1639.
- Meyer S., Held L., Höhle M. (2017) Spatio-Temporal Analysis of Epidemic Phenomena Using the R Package surveillance, Journal of Statistical Software, 77(11), 1–55.
- Runfola D. et al. (2020) geoboundaries: A global database of political administrative boundaries, PLoS ONE, 15(4).
- WorldPop (2018) Global high resolution population denominators project, opp1134076.