-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmodel_1.rnw
149 lines (111 loc) · 6.75 KB
/
model_1.rnw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
The overall idea with using rjags is to define a model structure, give this, our datasets, and prior distributions to the MCMC engine, and get back posterior distributions of specified parameters.
First, the package needs to be loaded, and we may as well set a seed for the random number generator so that our results are reproducible:
<<>>=
library(rjags)
set.seed(1)
@
Now let's make a basic model to test rjags. The simplest realistic example is a linear relationship with some y-scatter. Let's make an artificial dataset for testing:
<<>>=
## Create an artificial dataset:
true_slope = 1
true_intercept = -5
true_sigma = 2
N = 50
x = runif(N,0,10)
y = rnorm(N,true_slope * x + true_intercept, true_sigma)
@
At least for now, we should take a look at our data before analyzing it, just to make sure it is how we expect it to be:
<<fig.pos="H", fig.height=5, fig.width=5>>=
plot(x,y)
@
Now let's talk about the model we will use. We are modeling the data as linear with gaussian scatter:
\[
y_i \sim \mathcal{N}(a x_i + b,\sigma^2)
\]
To actually implement this model, we will need to express it in the JAGS langauge. The JAGS language is a little different from normal programming languages like R. It is a declarative language, meaning that the code you write is not followed as a sequence of steps, but rather as a description of logic. For instance, the line \code{y.hat[i] = a * x[i] + b} does not evaluate the right side and set the left side equal to it, as R would. Instead, this code only specifies the model structure that JAGS should use.
For now, the model code will be separated into two sections. The first is for specifying priors on our parameters of interest. Here, we are interested in the slope \code{a}, intercept \code{b}, and standard deviation \code{sigma} of our dataset. The second section is for describing how our different variables are related to each other. Then, the whole chunk of JAGS code is wrapped in \code{model\{...\}} and passed into the JAGS engine.
For this dataset, the priors might be as follows. The slope \code{a} we expect to be non-negative and no greater than 5, so we could use a uniform density between 0 and 5. The intercept \code{b} could be anywhere between $\pm 10$, and the scatter's standard deviation \code{sigma} is expected to be between 0 and 3. Note that the choice of priors is domain specific and not the emphasis here, but we have chosen priors that include the ``true" values.
Let's write the JAGS code for this model and capture it into a string:
<<>>=
model_string = "
model {
## priors:
a ~ dunif(0,5)
b ~ dunif(-10,10)
sigma ~ dunif(0,3)
## structure:
for (i in 1:N) {
y[i] ~ dnorm(a * x[i] + b, pow(sigma, -2))
}
}
"
@
Density functions in JAGS always start with ``d": \code{dunif} for uniform density, \code{dnorm} for gaussian density, \code{dlnorm} for lognormal density, etc. The tildes are read as ``is distributed as". For instance, \code{a ~ dunif(0,5)} is read as ``the variable \code{a} is distributed uniformly between 0 and 5".
In the structure block, we have a traditional for loop. This allows us to relate each \code{y} value to each \code{x} value such that each \code{y} value is normally distributed around the line $a*x + b$ with standard deviation \code{sigma}. For historical reasons, JAGS distribution functions use a ``precision" that is defined as one over the variance. Usually it is more convenient to speak in terms of standard deviation and variance, so most models will have a small conversion between standard deviation and precision included.
Note that the order of lines is somewhat unimportant -- the priors could have been included after the structure and it would make no difference.
Now that we have a model structure and some data, we can give this to JAGS and see what comes back. Normally the JAGS engine wants the model structure in a separate file, but to keep it easy we'll use a \code{textConnection} instead. This allows us to create a fake file whose contents are the model string we created above, and we'll give this to JAGS instead.
Now, we want to give our model structure and data to JAGS and get back the output of the MCMC process to analyze.
<<>>=
model = jags.model(file = textConnection(model_string),
data = list('x' = x,
'y' = y,
'N' = N),
n.chains = 1,
n.adapt = 1000,
inits = list('.RNG.name' = 'base::Mersenne-Twister',
'.RNG.seed' = 1))
@
This has created a model, and printed some information that is not particularly interesting. We can then actually generate a Markov chain with
<<>>=
output = coda.samples(model = model,
variable.names = c("a", "b", "sigma"),
n.iter=1000,
thin=1)
@
We get back an object \code{output} that contains the Markov chain. First, as always in R, have a look at the summary of it:
<<>>=
print(summary(output))
@
This gives us information about the distributions of the parameters we asked JAGS to calculate (\code{a}, \code{b}, and \code{sigma}). Notice that the means for each parameter are quite close to the true values. We can also see this visually:
<<fig.pos="H", fig.height=6, fig.width=6>>=
plot(output)
@
We see mixing diagrams on the left, and posterior distributions on the right. The mixing diagrams appear qualitatively acceptable, and the posterior distributions are centered on or quite near the true values of 1, -5, and 2, respectively. The model has successfully recovered the values we used to create the data. The total, self-contained code for this is as follows:
<<eval=F>>=
library(rjags)
set.seed(1)
true_slope = 1
true_intercept = -5
true_sigma = 2
N = 50
x = runif(N,0,10)
y = rnorm(N,true_slope * x + true_intercept, true_sigma)
plot(x,y)
model_string = "
model {
## priors:
a ~ dunif(0,5)
b ~ dunif(-10,10)
sigma ~ dunif(0,3)
## structure:
for (i in 1:N) {
y[i] ~ dnorm(a * x[i] + b, pow(sigma, -2))
}
}
"
model = jags.model(file = textConnection(model_string),
data = list('x' = x,
'y' = y,
'N' = N),
n.chains = 1,
n.adapt = 1000,
inits = list('.RNG.name' = 'base::Mersenne-Twister',
'.RNG.seed' = 1))
output = coda.samples(model = model,
variable.names = c("a", "b", "sigma"),
n.iter=1000,
thin=1)
print(summary(output))
plot(output)
@
The rest of the examples in this paper will follow more or less the same format: create or load in some data, define the model, pass it to JAGS, and have a look at the output.