forked from donaldRwilliams/BGGM
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
885 lines (683 loc) · 27.8 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
---
output: github_document
bibliography: inst/REFERENCES.bib
---
```{r, echo = FALSE, message=F}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
dev = "png",
dpi = 200,
fig.align = "center",
knitr::opts_chunk$set(comment = NA)
)
library(ggplot2)
library(BGGM)
```
<img src="readme_models/hex.jpg" width = 250 />
# BGGM: Bayesian Gaussian Graphical Models
[![CRAN Version](http://www.r-pkg.org/badges/version/BGGM)](https://cran.r-project.org/package=BGGM)
[![Downloads](https://cranlogs.r-pkg.org/badges/BGGM)](https://cran.r-project.org/package=BGGM)
[![Build Status](https://travis-ci.org/donaldRwilliams/BGGM.svg?branch=master)](https://travis-ci.org/donaldRwilliams/BGGM)
[![status](https://joss.theoj.org/papers/4ecb84c5b3b2a2b5da46be4e0700502f/status.svg)](https://joss.theoj.org/papers/4ecb84c5b3b2a2b5da46be4e0700502f)
The `R` package **BGGM** provides tools for making Bayesian inference in
Gaussian graphical models (GGM). The methods are organized around two general
approaches for Bayesian inference: (1) estimation and (2) hypothesis
testing. The key distinction is that the former focuses on either
the posterior or posterior predictive distribution [@Gelman1996a; see section 5 in @rubin1984bayesianly], whereas the latter focuses on model comparison with the Bayes factor [@Jeffreys1961; @Kass1995].
## What is a Gaussian Graphical Model ?
A Gaussian graphical model captures conditional (in)dependencies among a set
of variables. These are pairwise relations (partial correlations) controlling for
the effects of all other variables in the model.
### Applications
The Gaussian graphical model is used across the sciences, including
(but not limited to) economics [@millington2020partial], climate science
[@zerenner2014gaussian], genetics [@chu2009graphical], and psychology [@rodriguez2020formalizing].
## Installation
To install the latest release version (`2.0.0`) from CRAN use
```{r gh-installation, eval = FALSE}
install.packages("BGGM")
```
The current developmental version can be installed with
```{r, eval = FALSE}
if (!requireNamespace("remotes")) {
install.packages("remotes")
}
remotes::install_github("donaldRwilliams/BGGM")
```
## Overview
The methods in **BGGM** build upon existing algorithms that are well-known in the literature.
The central contribution of **BGGM** is to extend those approaches:
1. Bayesian estimation with the novel matrix-F prior distribution [@Mulder2018]
+ [Estimation](#bayesian-estimation) [@Williams2019]
2. Bayesian hypothesis testing with the matrix-F prior distribution [@Williams2019_bf]
+ [Exploratory hypothesis testing](#Exploratory)
+ [Confirmatory hypothesis testing](#Confirmatory)
3. Comparing Gaussian graphical models [@Williams2019; @williams2020comparing]
+ [Partial correlation differences](#partial-correlation-differences)
+ [Posterior predictive check](#posterior-predictive-check)
+ [Exploratory hypothesis testing](#exploratory-groups)
+ [Confirmatory hypothesis testing](#confirmatory-groups)
4. Extending inference beyond the conditional (in)dependence structure [@Williams2019]
+ [Predictability](#Predictability)
+ [Posterior uncertainty intervals](#partial-correlation-differences) for the
partial correlations
+ [Custom Network Statistics](#custom-network-statistics)
The computationally intensive tasks are written in `c++` via the `R` package **Rcpp** [@eddelbuettel2011rcpp] and the `c++` library **Armadillo** [@sanderson2016armadillo]. The Bayes factors are computed with the `R` package **BFpack** [@mulder2019bfpack]. Furthermore, there are [plotting](#example-network-plot) functions
for each method, control variables can be included in the model (e.g., `~ gender`),
and there is support for missing values (see `bggm_missing`).
## Supported Data Types
* **Continuous**: The continuous method was described in @Williams2019. Note that
this is based on the customary [Wishart distribution](https://en.wikipedia.org/wiki/Wishart_distribution).
* **Binary**: The binary method builds directly upon @talhouk2012efficient
that, in turn, built upon the approaches of @lawrence2008bayesian and
@webb2008bayesian (to name a few).
* **Ordinal**: The ordinal methods require sampling thresholds. There are two approach
included in **BGGM**. The customary approach described in @albert1993bayesian
(the default) and the 'Cowles' algorithm described in @cowles1996accelerating.
* **Mixed**: The mixed data (a combination of discrete and continuous) method was introduced
in @hoff2007extending. This is a semi-parametric copula model
(i.e., a copula GGM) based on the ranked likelihood. Note that this can be used for
*only* ordinal data (not restricted to "mixed" data).
## Illustrative Examples
The following includes brief examples for *some* of the methods in **BGGM**.
### Bayesian Estimation
#### Posterior Sampling
An ordinal GGM is estimated with
```{r, eval = FALSE}
library(BGGM)
library(ggplot2)
# data
Y <- ptsd[,1:5] + 1
# ordinal
fit <- estimate(Y, type = "ordinal",
analytic = FALSE)
```
Notice the `+ 1`. This is required, because the first category must be `1` when `type = "ordinal"`. The partial correlations can the be summarized with
```{r, eval = FALSE}
summary(fit)
#> BGGM: Bayesian Gaussian Graphical Models
#> ---
#> Type: ordinal
#> Analytic: FALSE
#> Formula:
#> Posterior Samples: 250
#> Observations (n):
#> Nodes (p): 5
#> Relations: 10
#> ---
#> Call:
#> estimate(Y = Y, type = "ordinal", analytic = FALSE, iter = 250)
#> ---
#> Estimates:
#> Relation Post.mean Post.sd Cred.lb Cred.ub
#> B1--B2 0.258 0.079 0.105 0.418
#> B1--B3 0.028 0.086 -0.127 0.189
#> B2--B3 0.517 0.058 0.406 0.616
#> B1--B4 0.356 0.070 0.210 0.486
#> B2--B4 -0.076 0.078 -0.219 0.063
#> B3--B4 0.246 0.077 0.107 0.385
#> B1--B5 0.131 0.080 -0.020 0.279
#> B2--B5 0.127 0.083 -0.040 0.284
#> B3--B5 0.202 0.079 0.063 0.366
#> B4--B5 0.349 0.070 0.209 0.474
#> ---
```
The returned object can also be plotted, which allows for visualizing the posterior uncertainty interval for each relation. An example is provided below in [Posterior uncertainty intervals](#posterior-uncertatiny).
The partial correlation matrix is accessed with
```{r, eval=FALSE}
pcor_mat(fit)
```
```{r, echo = FALSE, results='asis'}
load(file = "readme_models/pcor_ex.rda")
knitr::kable(pcor_ex, row.names = TRUE)
```
The graph is selected with
```{r, eval=FALSE}
E <- select(fit)
```
and then plotted
```{r, eval = FALSE}
# "communities"
comm <- substring(colnames(Y), 1, 1)
plot(select(fit),
groups = comm,
edge_magnify = 5,
palette = "Pastel1",
node_size = 12)
```
![](readme_models/plt_est_net.png)
This basic "workflow" can be used with all methods and data types. A more involved network plot
is provided below.
#### Analytic
There is also an analytic solution that is based on the Wishart distribution. This
simple solution provides competitive performance with "state-of-the-art" methods,
assuming that *n* (observations) > *p* (variables). The one
caveat is that it works only for `type = "continuous"` (the default).
```{r, eval=FALSE}
# analytic
fit <- estimate(Y, analytic = TRUE)
# network plot
plot(select(fit))
```
This is quite handy when (1) only the conditional dependence structure is of interest and (2) an immediate solution is desirable. An example of (2) is provided in [Posterior Predictive Check](#posterior-predictive-check).
<br>
### Bayesian Hypothesis Testing
The Bayes factor based methods allow for determining the conditional
**in**dependence structure (evidence for the null hypothesis).
#### Exploratory
```{r, eval=FALSE}
# now 10 nodes
Y <- ptsd[,1:10]
# exploratory hypothesis testing
fit<- explore(Y)
# select
E <- select(fit, alternative = "exhaustive")
```
The option `alternative = "exhaustive"` compares three hypotheses: (1) a null relation; (2) a positive
relation; and (3) a negative relation.
```{r, eval = FALSE}
summary(E)
#> BGGM: Bayesian Gaussian Graphical Models
#> ---
#> Type: ordinal
#> Alternative: exhaustive
#> ---
#> Call:
#> select.explore(object = fit, alternative = "exhaustive")
#> ---
#> Hypotheses:
#> H0: rho = 0
#> H1: rho > 0
#> H2: rho < 0
#> ---
#>
#> Relation Post.mean Post.sd Pr.H0 Pr.H1 Pr.H2
#> B1--B2 0.263 0.080 0.000 0.999 0.001
#> B1--B3 0.020 0.081 0.710 0.173 0.116
#> B2--B3 0.523 0.073 0.000 1.000 0.000
#> B1--B4 0.362 0.070 0.000 1.000 0.000
#> B2--B4 -0.082 0.068 0.459 0.061 0.480
#> B3--B4 0.252 0.073 0.000 1.000 0.000
#> B1--B5 0.129 0.072 0.120 0.847 0.033
#> B2--B5 0.118 0.078 0.223 0.726 0.051
#> B3--B5 0.213 0.077 0.001 0.996 0.003
#> B4--B5 0.348 0.072 0.000 1.000 0.000
```
The posterior hypothesis probabilities are provided in the last three columns.
When using `plot(E)`, there is a network plot for each hypothesis.
#### Confirmatory
A central contribution of **BGGM** is confirmatory hypothesis testing of (in)equality constraints [@Hoijtink2011]. By this we are referring to testing expectations, as opposed to feeding the data to, say, `estimate`, and seeing what happens to emerge.
In this example, the focus is on suicidal thoughts (`PHQ9`) in a comorbidity network. Here is an example set of hypotheses
```{r}
# data (+ 1)
Y <- depression_anxiety_t1 + 1
# example hypotheses
hyp <- c("PHQ2--PHQ9 > PHQ1--PHQ9 > 0;
PHQ2--PHQ9 = PHQ1--PHQ9 = 0")
```
There are two hypotheses separated by (`;`). The first expresses that the relation `PHQ2--PHQ9` ("feeling down, depressed, or hopeless" and "suicidal thoughts") is larger than `PHQ1--PHQ9` ("little interest or pleasure in doing things" and "suicidal thoughts"). In other words, that the partial correlation is larger for `PHQ2--PHQ9`. There is an additional constraint to positive values (`> 0`) for both relations. The second hypothesis is then a "null" model.
```{r, echo=FALSE, warning = FALSE}
load(file = "readme_models/fit_hyp1.rda")
fit <- fit_hyp1
```
```{r, eval=FALSE}
# (try to) confirm
fit <- confirm(Y = Y, hypothesis = hyp,
type = "ordinal")
```
The object `fit` is then printed
```{r, eval=FALSE}
fit
#> BGGM: Bayesian Gaussian Graphical Models
#> Type: ordinal
#> ---
#> Posterior Samples: 250
#> Observations (n): 403
#> Variables (p): 16
#> Delta: 15
#> ---
#> Call:
#> confirm(Y = Y + 1, hypothesis = hyp, type = "ordinal",
#> iter = 250)
#> ---
#> Hypotheses:
#>
#> H1: PHQ2--PHQ9>PHQ1--PHQ9>0
#> H2: PHQ2--PHQ9=PHQ1--PHQ9=0
#> H3: complement
#> ---
#> Posterior prob:
#>
#> p(H1|data) = 0.895
#> p(H2|data) = 0.002
#> p(H3|data) = 0.103
#> ---
#> Bayes factor matrix:
#> H1 H2 H3
#> H1 1.000 529.910 8.666
#> H2 0.002 1.000 0.016
#> H3 0.115 61.147 1.000
#> ---
#> note: equal hypothesis prior probabilities
```
The posterior hypothesis probability is `0.895` which provides some evidence for the order
constraint. The Bayes factor matrix then divides the posterior probabilities. This provide a measure
of *relative* support for which hypothesis the data were more likely under.
Finally, the results can be plotted
```{r, eval=FALSE}
plot(fit) +
scale_fill_brewer(palette = "Set2",
name = "Posterior Prob") +
ggtitle("Confirmatory: Comorbidity Network")
```
![](readme_models/confirm_hyp.png)
This demonstrates that all the `plot()` functions in **BGGM** return `ggplot` objects that can be further customized. Note that **BGGM** is not focused on making publication ready plots. Typically the bare minimum is provided that can then be honed in.
<br>
### Comparing Gaussian Graphical Models
#### Partial Correlation Differences
This method compares groups by computing the difference for each relation in the model. In other
words, there are pairwise contrasts for each partial correlation, resulting in a posterior
distribution for each difference.
In all examples in this section, personality networks are compared for males and females.
```{r}
# data
Y <- bfi
# males
Ymales <- subset(Y, gender == 1,
select = -c(gender, education))
# females
Yfemales <- subset(Y, gender == 2,
select = -c(gender, education))
```
Fit the model
```{r, eval=FALSE}
fit <- ggm_compare_estimate(Ymales, Yfemales)
```
Then plot the results, in this case the posterior distribution for each difference
```{r, eval=FALSE}
# plot summary
plot(summary(fit))
```
![](readme_models/ggm_compare_estimate.png)
Note that it is also possible to use `select` for the object `fit` and then plot the results. This
produces a network plot including the selected differences.
Furthermore, it is also possible to plot the partial correlations (not the differences). This
is accomplished by using `plot` with the summary computed from an `estimate` object
([see above](#bayesian-estimation)).
#### Posterior Predictive Check
The predictive check method uses Jensen-Shannon divergence (i.e., symmetric Kullback-Leibler divergence [Wikipedia](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence)) and the sum of squared error (for the partial correlation matrices) to compare groups [@williams2020comparing].
The following compares the groups
```{r, eval=FALSE}
fit <- ggm_compare_ppc(Ymales, Yfemales)
```
Then print the summary output with
```{r, eval=FALSE}
fit
#> BGGM: Bayesian Gaussian Graphical Models
#> ---
#> Test: Global Predictive Check
#> Posterior Samples: 500
#> Group 1: 805
#> Group 2: 1631
#> Nodes: 25
#> Relations: 300
#> ---
#> Call:
#> ggm_compare_ppc(Ymales, Yfemales, iter = 500)
#> ---
#> Symmetric KL divergence (JSD):
#>
#> contrast JSD.obs p_value
#> Yg1 vs Yg2 0.442 0
#> ---
#>
#> Sum of Squared Error:
#>
#> contrast SSE.obs p.value
#> Yg1 vs Yg2 0.759 0
#> ---
#> note:
#> JSD is Jensen-Shannon divergence
```
In this case, there seems to be decisive evidence that the networks are different (as indicated by the
posterior predictive *p*-value). The predictive distribution can also be plotted
```{r, eval=FALSE, out.width = '65%', warning = FALSE, message=FALSE}
plot(fit,
critical = 0.05)$plot_jsd
```
![](readme_models/ppc_1.png)
where the red region is the "critical" area and the black point is the observed KL divergence for the networks. This again shows that the "distance" between the networks is much more than expected, assuming that the groups were actually the same.
This next example is a new feature in **BGGM** (`2.0.0`), that allows for comparing GGMs any way the user wants. All that is required is to (1) decide on a test-statistic and (2) write a custom function.
Here is an example using Hamming distance ([Wikipedia](https://en.wikipedia.org/wiki/Hamming_distance)), which is essentially the squared error between adjacency matrices (a test for different structures).
First define the custom function
```{r}
f <- function(Yg1, Yg2){
# remove NA
x <- na.omit(Yg1)
y <- na.omit(Yg2)
# nodes
p <- ncol(x)
# identity matrix
I_p <- diag(p)
# estimate graphs
fit1 <- estimate(x, analytic = TRUE)
fit2 <- estimate(y, analytic = TRUE)
# select graphs
sel1 <- select(fit1)
sel2 <- select(fit2)
# Hamming distance
sum((sel1$adj[upper.tri(I_p)] - sel2$adj[upper.tri(I_p)])^2)
}
```
Note that (1) `analytic = TRUE` is being used, which is needed in this case because two graphs are
estimated for each iteration (or draw from the posterior predictive distribution) and (2) `f` requires two datasets as the input and returns a single number (the chosen test-statistic). The next step is to compute the observed Hamming distance
```{r, echo=FALSE, warning = FALSE}
load(file = "readme_models/obs.rda")
```
```{r, eval=FALSE}
# observed difference
obs <- f(Ymales, Yfemales)
```
then compare the groups
```{r, eval = FALSE}
fit <- ggm_compare_ppc(Ymales, Yfemales,
FUN = f,
custom_obs = obs)
# print
fit
#> BGGM: Bayesian Gaussian Graphical Models
#> ---
#> Test: Global Predictive Check
#> Posterior Samples: 250
#> Group 1: 805
#> Group 2: 1631
#> Nodes: 25
#> Relations: 300
#> ---
#> Call:
#> ggm_compare_ppc(Ymales, Yfemales, iter = 250, FUN = f, custom_obs = obs)
#> ---
#> Custom:
#>
#> contrast custom.obs p.value
#> Yg1 vs Yg2 75 0.576
#> ---
```
In this case, the *p*-value does not indicate that the groups are different for this test-statistic. This may seem contradictory to the previous results, but it is important to note that
Hamming distance asks a much different question related to the adjacency matrices (no other
information, such as edge weights, is considered).
#### Exploratory (groups)
The Bayes factor based methods allow for determining the conditional
**in**dependence structure (evidence for the null hypothesis), in this case for group equality.
Fit the model
```{r, eval=FALSE}
fit <- ggm_compare_explore(Ymales, Yfemales)
```
Then plot the results
```{r, eval=FALSE}
plot(summary(fit)) +
theme_bw() +
theme(panel.grid = element_blank(),
axis.text.y = element_blank())
```
![](readme_models/plt_ggm_compare_explore.png)
Here the posterior probability for a difference is visualized for each relation in the GGM. Note
that it is also possible to use `select` for the object `fit` and then plot the results. This
produces a network plot including the selected differences, in addition to a plot depicting the
relations for which there was evidence for the null hypothesis.
#### Confirmatory (groups)
A central contribution of **BGGM** is confirmatory hypothesis testing of (in)equality constraints [@Hoijtink2011],
in this case for comparing groups. By this we are referring to testing expectations, as opposed to feeding the data to, say, `estimate`, and seeing what happens to emerge.
In this example, the focus is on agreeableness in a personality network. Here is a set of hypotheses
```{r}
hyp <- c("g1_A2--A4 > g2_A2--A4 > 0 & g1_A4--A5 > g2_A4--A5 > 0;
g1_A4--A5 = g2_A4--A5 = 0 & g1_A2--A4 = g2_A2--A4 = 0")
```
where the variables are `A2` ("inquire about others' well being"), `A4` ("love children"),
and `A5` ("make people feel at ease"). The first hypothesis states that the conditionally
dependent effects are larger for female than males (note the `&`), with the additional
constraint to positive values, whereas the second hypothesis is a "null" model.
The hypothesis is tested with the following
```{r, eval=FALSE}
fit <- ggm_compare_confirm(Yfemales, Ymales,
hypothesis = hyp)
# print
fit
#> BGGM: Bayesian Gaussian Graphical Models
#> Type: continuous
#> ---
#> Posterior Samples: 500
#> Group 1: 1631
#> Group 2: 805
#> Variables (p): 25
#> Relations: 300
#> Delta: 15
#> ---
#> Call:
#> ggm_compare_confirm(Yfemales, Ymales, hypothesis = hyp, iter = 500)
#> ---
#> Hypotheses:
#>
#> H1: g1_A2--A4>g2_A2--A4>0&g1_A4--A5>g2_A4--A5>0
#> H2: g1_A4--A5=g2_A4--A5=0&g1_A2--A4=g2_A2--A4=0
#> H3: complement
#> ---
#> Posterior prob:
#>
#> p(H1|data) = 0.989
#> p(H2|data) = 0
#> p(H3|data) = 0.011
#> ---
#> Bayes factor matrix:
#> H1 H2 H3
#> H1 1.000 1.180798e+14 92.115
#> H2 0.000 1.000000e+00 0.000
#> H3 0.011 1.281873e+12 1.000
#> ---
#> note: equal hypothesis prior probabilities
```
The posterior hypothesis probability is 0.989 which provides strong evidence for the
hypothesis that predicted these "agreeableness" relations would be larger in females
than in males. This can also be plotted, as in [Confirmatory (one group)](#confirmatory).
See @rodriguez2020formalizing for a full treatment of confirmatory testing in substantive applications.
### Beyond the Conditional (In)dependence Structure
#### Predictability
In this example, predictability is computed for each node in the network [see here for rationale @haslbeck2018well]. Currently **BGGM** computes Bayesian variance explained for all data types [@gelman_r2_2019].
The following computes predictability for binary data
```{r, eval=FALSE}
# binary
Y <- women_math
# fit model
fit <- estimate(Y, type = "binary")
# compute r2
r2 <- predictability(fit, iter = 500)
# plot
plot(r2, type = "ridgeline")
```
![](readme_models/predictability.png)
#### Posterior Uncertainty
See [Partial Correlation Differences](#partial-correlation-differences)
#### Custom Network Statistics
A new feature to **BGGM** allows for computing user defined network statistics, given a partial correlation or
weighted adjacency matrix.
Here is an example for bridge centrality [@jones2019bridge]. The first step is to define the function
```{r, eval=FALSE}
# need this package
library(networktools)
# custom function
f <- function(x, ...){
bridge(x, ...)$`Bridge Strength`
}
```
Note that `x` takes the matrix and `f` can return either a single number or a number for each node. The next step is to fit the model and compute the network statistic
```{r, eval=FALSE}
# data
Y <- ptsd
# clusters
communities <- substring(colnames(Y), 1, 1)
# estimate the model
fit <- estimate(Y)
# bridge strength
net_stat <- roll_your_own(fit,
FUN = f,
select = TRUE,
communities = communities)
```
The function `f` is provided to `FUN` and `communities` is passed to `brigde` (inside of `f`) via `...`. The results can be printed
```{r, eval=FALSE}
# print
net_stat
#> BGGM: Bayesian Gaussian Graphical Models
#> ---
#> Network Stats: Roll Your Own
#> Posterior Samples: 100
#> ---
#> Estimates:
#>
#> Node Post.mean Post.sd Cred.lb Cred.ub
#> 1 0.340 0.097 0.166 0.546
#> 2 0.319 0.100 0.176 0.513
#> 3 0.000 0.000 0.000 0.000
#> 4 0.337 0.086 0.189 0.489
#> 5 0.559 0.133 0.332 0.791
#> 6 0.188 0.073 0.029 0.320
#> 7 0.505 0.138 0.241 0.781
#> 8 0.153 0.070 0.022 0.286
#> 9 0.175 0.063 0.041 0.281
#> 10 0.000 0.000 0.000 0.000
#> 11 0.365 0.107 0.178 0.627
#> 12 0.479 0.093 0.280 0.637
#> 13 0.155 0.074 0.022 0.301
#> 14 0.000 0.000 0.000 0.000
#> 15 0.374 0.097 0.175 0.550
#> 16 0.174 0.065 0.034 0.295
#> 17 0.000 0.000 0.000 0.000
#> 18 0.491 0.132 0.238 0.745
#> 19 0.613 0.113 0.408 0.825
#> 20 0.144 0.066 0.038 0.289
#> ---
```
And then plotted
```{r, eval = FALSE}
plot(net_stat)
```
![](readme_models/bridge.png)
There are additional examples in the documentation.
### Example Network Plot
Here is an example of a more involved network plot. In this case, the graph is estimated with a semi-parametric copula (`type = "mixed"`), where two control variables are included in the model.
```{r, echo=FALSE}
load(file = "readme_models/fit_plt_ex.rda")
fit <- fit_plt_ex
# select graph
E <- select(fit)
```
```{r, eval=FALSE}
# personality (includes gender and education)
Y <- bfi
# fit copula GGM
fit <- estimate(Y, type = "mixed")
# select graph
E <- select(fit)
```
The graph is then plotted
```{r, eval=FALSE }
# extract communities
comm <- substring(colnames(Y), 1, 1)
# plot
plot(E,
# enlarge edges
edge_magnify = 5,
# cluster nodes
groups = comm,
# change layout
layout = "circle")$plt +
# plot title
ggtitle("Semi-Parametric Copula") +
# add custom labels
scale_color_brewer(breaks = c("A", "C",
"E", "N",
"O", "e",
"g"),
labels = c("A", "C",
"E", "N",
"O",
"Education",
"Gender"),
palette = "Set2")
```
![](readme_models/plt_net_example.png)
Note that `layout` can be changed to any option provided in the `R` package **sna** [@sna].
## Additional Features
The primary focus of **BGGM** is Gaussian graphical modeling (the inverse covariance matrix).
The residue is a suite of useful methods not explicitly for GGMs. For example,
### Bivariate Correlations
Bivariate correlations for `binary` (tetrachoric), `ordinal` (polychoric), `mixed` (rank based),
and `continuous` (Pearson's) data.
Here is an example for computing tetrachoric correlations:
```{r, echo=FALSE}
load(file = "readme_models/binary_cors.rda")
```
```{r, eval=FALSE}
# binary data
Y <- women_math[1:500,]
cors <- zero_order_cors(Y, type = "binary", iter = 250)
cors$R
```
```{r, echo = FALSE, results='asis'}
row.names(cors$R_mean) <- 1:6
knitr::kable(round(cors$R_mean, 3),
col.names = c("1", "2", "3","4","5", "6"),
row.names = TRUE)
```
The object `cors` also includes the sampled correlation matrices (in this case 250) in an array.
### Multivariate Regression
Multivariate regression for binary (probit), ordinal (probit),
mixed (rank likelihood), and continuous data.
Here is an example for a multivariate probit model with an ordinal outcome, where
`E5` ("take charge") and `N5` ("panic easily") are predicted by `gender` and `education`:
```{r,echo=FALSE}
load("readme_models/mv_probit.rda")
```
```{r, eval = F}
# personality data
Y <- bfi
# variables
Y <- subset(Y, select = c("E5", "N5",
"gender", "education"))
mv_probit <- estimate(Y, formula = ~ gender + as.factor(education),
type = "ordinal")
```
Note that **BGGM** does not use the customary `model.matrix` formulation. This is for good reason, as
each variable in the GGM does not need to be written out. Here we effectively "tricked" **BGGM** to
fit a multivariate probit model (each variable included in `formula` is removed from `Y`).
```{r}
regression_summary(mv_probit)
```
This basic idea can also be used to fit regression models with a single outcome.
## Note on Conditional (In)dependence Models for Latent Data
All of the data types (besides continuous) model latent data. That is, unobserved data
that is assumed to be Gaussian distributed. For example, a tetrachoric correlation
(binary data) is a special case of a polychoric correlation (ordinal data).
Both relations are between "theorized normally distributed continuous *latent*
variables [Wikepedia](https://en.wikipedia.org/wiki/Polychoric_correlation).
In both instances, the corresponding partial correlation between observed
variables is conditioned on the remaining variables in the *latent* space.
This implies that interpretation is similar to continuous data, but with respect
to latent variables. We refer interested users to
[see page 2364, section 2.2, in @webb2008bayesian].
## High Dimensional Data?
**BGGM** was built specifically for social-behavioral scientists. Of course, the methods
can be used by all researchers. However, there is currently *not* support for high-dimensional data
(i.e., more variables than observations) that are common place in, say, the genetics literature.
These data are rare in the social-behavioral sciences. In the future, support for high-dimensional
data may be added to **BGGM**.
## Bug Reports, Feature Requests, and Contributing
Bug reports and feature requests can be made by opening an issue on [Github](https://github.com/donaldRwilliams/BGGM/issues). To contribute towards
the development of **BGGM**, you can start a branch with a pull request and we can
discuss the proposed changes there.
## Comparison to Other Software
**BGGM** is the only `R` package to implement all of these algorithms and methods. The `mixed` data approach
is also implemented in the package **sbgcop** [base `R`, @hoff2007extending]. The `R` package **BDgraph** implements a Gaussian copula graphical model in `c++` [@mohammadi2015bdgraph], but not the binary or ordinal approaches. Furthermore, **BGGM** is the only package for confirmatory testing and comparing graphical models with the methods described in @williams2020comparing.
## References