-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathhm.Rmd
141 lines (93 loc) · 4.51 KB
/
hm.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
title: "Heatmaps"
output:
html_document:
number_sections: TRUE
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = F, message = F)
rm(list = ls())
```
In order to answer biological questions, often a combination of high-throughput data is generated. Especially in combination with microbiome types of data, the associated metabolome is naturally of interest, as these two sources together reflects _who are there_ and _what do they do_.
In the analysis of such data a natural starting point is to look for common structure. That is; which types of bacteria are correlated to a certain type of metabolites.
# Correlation Heat map {-}
A natural starting point for such an analysis is to produce all pairwise correlations between the OTU table and the metabolite table, and visualize it in a heat map. The build in heatmap() function in R provides a quick initial overview of the data.
## Preprocess the data {-}
```{r, cca_chunk1}
library(phyloseq)
library(tidyverse)
load('./data/Rats_inulin.RData')
phyXpp <- subset_samples(phyX, !is.na(Acetate))
phyXpp <- filter_taxa(phyXpp, function(x) sum(x>0)>0, TRUE)
phyXpp <- transform_sample_counts(phyXpp, function(x) x/sum(x))
OTUtb <- data.frame(t(otu_table(phyXpp)))
Mtb <- sample_data(phyXpp)[,c("Acetate","Butyrate","Fructose","Glucose","Lactate","Propionate","Xylose")]
# preprocess the Mtb data
Mtb$Xylose <- Mtb$Xylose + 0.141/2 # there is zeros for this one - so to fix log we need to add a bit.
Mtb <- log(Mtb)
```
## Pairwise correlations {-}
Compute all pairwise correlations
```{r}
cr <- cor(OTUtb,Mtb)
```
## Visualize {-}
```{r}
heatmap(cr)
```
The ordering of the coloumns and rows for this heatmap is based on correlations structure of the _correlation matrix_. I.e. it is _NOT_ based on similarity of OTUs and metabolites respectively. This might lead to a wrong interpretation. So instead the dendrograms for the rows is based on the correlations of the OTUs and the dendrogram for the coloumns are based on the correlations of the metabolites.
```{r}
rd<-dist(t(OTUtb))
rc<-hclust(rd)
cd<-dist(t(Mtb))
cc<-hclust(cd)
heatmap(cr, Rowv=as.dendrogram(rc), Colv=as.dendrogram(cc))
```
# Exercises
## Heatmaps
Import and run the codes above for generating heatmaps
## Labels
You might want to change the labels for the bacteria to something more intuitive, and furhter export the plot to pdf for zooming. Try to do this and answer whether you see structure in relation to taxonomic level.
```{r, eval = F}
Txtb <- data.frame(tax_table(phyX))
colnames(OTUtb) <- Txtb$Rank5 # use Rank5 as labels
```
## Effect of Preprocessing
Try to change the preprocessing such as adding log tranform, and plot histograms of the two versions of the obtained correlations. What do you see?
## A zoom-in!
Digg out OTU_154 and Acetate and plot those against each other. Add log-tranform with a pseudo-count and see what happends. try to add diet info... does that explain anything?
```{r,eval=FALSE, include=FALSE}
phyXsel <- subset_samples(phyX, !is.na(Acetate))
phyXsel <- filter_taxa(phyXsel, function(x) sum(x>0)>0, TRUE)
phyXsel <- transform_sample_counts(phyXsel, function(x) (x/ sum(x)))
OTUtb <- data.frame(t(otu_table(phyXsel)))
cr_lg <- cor(OTUtb,Mtb)
cr <- cor(OTUtb,Mtb)
d <- abs(cr_lg - cr)
which(d==max(d),arr.ind = T)
cr[which(d==max(d),arr.ind = T)]
cr_lg[which(d==max(d),arr.ind = T)]
df <- cbind(OTUtblg,Mtb)
ggplot(data = df, aes(OTU_154,Acetate)) + geom_point() + stat_smooth(method = lm)
df <- cbind(OTUtb,Mtb)
ggplot(data = df, aes(OTU_154,Acetate)) + geom_point() + stat_smooth(method = lm)
```
```{r, eval = T}
phyXsel <- subset_samples(phyX, !is.na(Acetate))
OTUsel <- data.frame(t(otu_table(phyXsel)))
df <- cbind(sample_data(phyXsel),OTUsel)
g1 <- ggplot(data = df, aes(OTU_154,Acetate)) +
geom_point() + stat_smooth(method = lm) + ggtitle('raw')
g2 <- ggplot(data = df, aes(log2(OTU_154 + 0.1),Acetate )) +
geom_point(aes(color = Description)) + stat_smooth(method = lm) + ggtitle('log(count + pc)')
gridExtra::grid.arrange(g1,g2, nrow = 1)
```
What do you learn from this? ... Should be a kinda scarry insight!
Try to color the samples according to intervention group, and see the difference:
```{r, eval=FALSE}
g1 <- ggplot(data = df, aes(OTU_154,Acetate)) +
geom_point(aes(color = Description)) + stat_smooth(method = lm) + ggtitle('raw')
g2 <- ggplot(data = df, aes(log2(OTU_154 + 0.1),Acetate)) +
geom_point(aes(color = Description)) + stat_smooth(method = lm) + ggtitle('log(count + pc)')
gridExtra::grid.arrange(g1,g2, nrow = 1)
```