-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
66 lines (48 loc) · 2.57 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# sandwich2stage
<!-- badges: start -->
<!-- badges: end -->
`sandwich2stage` introduces a function for computing an estimate of the sandwich variance for the two-stage regression model setting of regression calibration. The sandwich is one approach for obtaining standard errors in two-stage regression settings that account for the extra uncertainty added by the calibration model step. Our function computes an estimate of the sandwich variance obtained by stacking the stage 1 and stage 2 estimating equation contributions.
## Installation
`sandwich2stage` may be installed from github as follows:
``` {r}
library(devtools)
install_github("lboe23/sandwich2stage", subdir="pkg")
```
## Example
After we have loaded our package, we will want to load in the sample data assumed to be from a simple random sample, which is contained in our package.
```{r load data}
library(sandwich2stage)
data("sandwichdata_SRS")
```
Next, we need to load the survey package and create a survey design object. For the case of the data `sandwichdata_SRS` from a simple random sample, we will want to specify a simple random sampling design.
```{r create design}
library(survey)
sampdesign <- svydesign(id=~1, data=sandwichdata_SRS)
```
We may then fit the stage 1 and stage 2 models, saving the estimated nuisance parameters from the stage 1 model and using them to obtain an estimated of the unknown exposure (xhat). This estimate exposure, xhat, will be used as a covariate in the stage 2 model. Note that the stage 1 model is only fit to the subset which contains validation data (i.e. where v=1).
```{r stage 1}
stage1.model<-survey::svyglm(xstarstar~xstar+z,design=sampdesign,family=gaussian(),subset=v==1)
alphas.stage1<-coef(stage1.model)
sampdesign <- update(sampdesign,xhat =predict(stage1.model,newdata=sampdesign$variables) )
```
We will then fit the stage 2 model with the estimated exposure (xhat) as a covariate.
```{r stage 2}
stage2.model<- survey::svyglm(y ~ xhat+z,design=sampdesign,family=binomial())
```
Finally, we can obtain an estimate of the sandwich variance using our function`sandwich2stage()`. The sandwich variance matrix is saved below in the object `sandwichvar`.
```{r sandwich}
sandwich.object<-sandwich2stage(stage1.model,stage2.model,xstar="xstar",xhat="xhat",Stage1ID="ID",Stage2ID="ID")
sandwichvar<-vcov(sandwich.object)
```