-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path6-poisson.Rmd
138 lines (101 loc) · 2.92 KB
/
6-poisson.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
title: "R Notebook"
output: html_notebook
---
```{r}
install.packages("countreg", repos="http://R-Forge.R-project.org")
install.packages("AER")
library(haven) ## for reading sas files
library(MASS) ## for glm.nb -- negative binomial
library(vcd) ## for testing goodnes of fit for poisson and negative binomial
library(countreg) ## for modelling count data
library(AER) ## bunch of useful functions
df <- read_sas("../data/baza2005.sas7bdat")
head(df)
```
1. start with visualisation
```{r}
barplot(table(df$LOS_LT14), xlab = "Number of children under 14",
ylab = "Number of households")
```
2. Is it a Poisson distribution?
```{r}
mean(df$LOS_LT14) ## expected value E(Y) = lambda
var(df$LOS_LT14) ## variance of Y Var(Y) > E(Y) -> overdispersion
var(df$LOS_LT14)/mean(df$LOS_LT14)
```
3. We may do the tests before modeling
```{r}
plot(goodfit(df$LOS_LT14, type = "poisson"))
plot(goodfit(df$LOS_LT14, type = "nbinomial", par = list(size = 10)))
```
4. Now, we will create three model:
+ poisson (use glm function)
+ quasi-poisson (use glm function)
+ negative binomial (using MASS::glm.nb function)
$$
\text{LOS_LT14} = \exp(\beta_1 + \beta_2 \text{DOCH} + ...)
$$
$$
\log(\text{LOS_LT14}) = \beta_1 + \beta_2 \text{DOCH} + ...
$$
```{r}
m1 <- glm(formula = LOS_LT14 ~ I(DOCH / 1000) + factor(KLM) + factor(WOJ),
data = df,
family = poisson())
summary(m1)
```
Używamy transformacji odwrotnej do logarytmu naturalnego czyli funkcję e
```{r}
exp(coef(m1))
```
```{r}
m2 <- glm(LOS_LT14 ~ I(DOCH/1000) + factor(KLM) + factor(WOJ),
data = df,
family = quasipoisson())
summary(m2)
```
To interpret the parameters from the glm model we just need to use the exp transformation (because by default in Poisson and quasi-Poisson we use log link)
```{r}
exp(coef(m1))
```
We use dispersion test to verify if dispersion is actually the problem
$$
\begin{cases}
trafo(\mu) = \mu, & \text{if quasi-poisson,} \\
trafo(\mu) = \mu^2, & \text{if negative binomial,}
\end{cases}
$$
$$
\begin{cases}
Var[y] = \mu + \alpha \cdot \mu, & \text{if quasi-poisson,} \\
Var[y] = \mu + \alpha \cdot \mu^2, & \text{if negative binomial,}
\end{cases}
$$
```{r}
dispersiontest(m1, trafo = 1) ## quasipoisson / negative binomial type 1
dispersiontest(m1, trafo = 2) ## negative binomial type 2
```
Negative-binomial model
```{r}
m3 <- glm.nb(formula = LOS_LT14 ~ I(DOCH/1000) + factor(KLM) + factor(WOJ),
data = df)
summary(m3)
```
Jak porównać te dwa modele -- poisson i negative binomial
```{r}
data.frame(poisson = exp(coef(m1)), negbin = exp(coef(m3)))
```
Compare which model is better -- Poisson or Negative-Binomial
```{r}
AIC(m1,m3)
BIC(m1,m3)
```
```{r}
countreg::rootogram(m1, main = "Poisson")
countreg::rootogram(m3, main = "Negative-Binomial")
```
1. visualize
2. calculate the models (poisson, quasipo, negbin, ...)
3. assess the models
4. select the "right" model (according to some measures)