-
Notifications
You must be signed in to change notification settings - Fork 0
/
05-alpha.Rmd
111 lines (76 loc) · 3.64 KB
/
05-alpha.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
```{r, echo=FALSE}
library(mia)
library(miaViz)
library(dplyr)
se <- readRDS("data/se.rds")
tse <- readRDS("data/tse.rds")
```
# Alpha diversity
This section demonstrates the analysis of alpha diversity. This
quantity measures microbial diversity within each sample.
Alpha diversity is a key quantity in a microbiome research. The [_mia_
package](https://microbiome.github.io/mia/) provides access to a wide
variety of alpha diversity indices. For more background information
and examples with various alpha diversity indices, see the [online
book](https://microbiome.github.io/OMA/microbiome-diversity.html#alpha-diversity).
There are different alpha diversity implementations that relate to:
- unit level: general, species or ASVs? prehaps genes?
- unit detection: precence / absence or abundances, weighing of rare units
- is phylogenic information incorporated, i.e. how well the sample covers the phylogenetic tree?`
Popular framework to capture the different aspects of alpha diversity is [**Hill numbers**](https://doi.org/10.1111/1755-0998.13014)
Higher numbers of unique taxa, and more even abundance distributions within a
sample yield larger values for alpha diversity.
- richness depicts the number of units / species in sample and is intuitive
- Shannon Index accounts also for the evenness
- Simpson Index weighs the abundant or dominating units more than Shannon Index
```{r}
# Calculates indices
tse <- estimateDiversity(tse, index = "shannon", name = names)
```
Next we can visualize Shannon index with histogram.
```{r}
# ggplot needs data.frame format as input.
# Here, colData is DataFrame, therefore it needs to be converted to data.frame
shannon_hist <- ggplot(as.data.frame(colData(tse)),
aes(x = shannon)) +
geom_histogram(bins = 20, fill = "gray", color = "black") +
labs(x = "Shannon index", y = "Sample frequency")
shannon_hist
```
## Visualization
Next let us compare indices between different patient status and
cohorts. Boxplot is suitable for that purpose.
```{r}
# Creates Shannon boxplot
shannon_box <- ggplot(as.data.frame(colData(tse)),
aes(x = patient_status,
y = Shannon_diversity,
fill = cohort)) +
geom_boxplot() +
theme(title = element_text(size = 12)) # makes titles smaller
```
For an alternative visualization, see examples with [scater::plotColData](https://microbiome.github.io/OMA/microbiome-diversity.html#alpha-diversity).
## Statistical testing and comparisons
To further investigate if patient status could explain the variation
of Shannon index, let's do a Wilcoxon test. This is a non-parametric
test that doesn't make specific assumptions about the distribution,
unlike popular parametric tests, such as the t test, which assumes
normally distributed observations.
Wilcoxon test can be used to estimate whether the differences between
two groups is statistically significant.
```{r}
# Wilcoxon test, where Shannon index is the variable that we are comparing.
# Patient status - ADHD or control - is the factor that we use for grouping.
wilcoxon_shannon <- wilcox.test(Shannon_diversity ~ patient_status, data = colData(tse))
wilcoxon_shannon
```
For more examples, see a dedicated section on alpha diversity in the
[online book](https://microbiome.github.io/OMA/microbiome-diversity.html#alpha-diversity).
## Exercises
Add the following in the reproducible summary report.
* Estimate alpha diversity for each sample and draw a histogram. Tip:
estimateDiversity
* Compare the results between two or more
alpha diversity indices (visually and/or statistically).
* See [online book](https://microbiome.github.io/OMA/microbiome-diversity.html#alpha-diversity)
for further examples.