forked from mortenarendt/dataanalyssisconsumerscience
-
Notifications
You must be signed in to change notification settings - Fork 0
/
07b_Pasta_mushroom_legume_PCA.Rmd
92 lines (67 loc) · 2.5 KB
/
07b_Pasta_mushroom_legume_PCA.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
## PCA on survey answers
We use the ggbiplot package for plotting the PCA model (See [Biplot] for details).
```{r}
library(data4consumerscience)
library(tidyverse)
library(ggbiplot)
```
### Wrangle data
PCA takes numerical data as input, so we use the likert-scales in the form of 1 to 7.
Further the yes/no answers are included, and also needs to be changed.
```{r}
data('pasta')
x <- pasta %>%
mutate(Did_you_take_food_from_both_Dish1_and_Dish2 =
Did_you_take_food_from_both_Dish1_and_Dish2 %>% factor %>% as.numeric(),
Did_you_consider_the_proteincontent_of_the_dishes_you_choose =
Did_you_consider_the_proteincontent_of_the_dishes_you_choose %>%
factor() %>% as.numeric()) %>%
mutate_if(is.factor, as.numeric) %>%
filter(Day==1) %>% # the survey part is the same for both days and both stations. That is what we keep.
filter(str_detect(StationName,'leg'))
```
### Build the model
```{r}
PCAmdl <- prcomp(x[,c(5:6,8:11)],scale. = T)
```
### Bi-plot
And a plot of the model
```{r,fig.height=6}
ggbiplot(PCAmdl, varname.size = 5) + ylim(c(-4,4)) + xlim(c(-2,5))
```
What does component 1 (PC1) reflect? What does PC2 reflect?
Lets plot the model and color the samples according to the consumption (of legumes) cutted at the median.
```{r,fig.height=6}
ggbiplot(PCAmdl, groups = factor(x$Consumption>130), ellipse = T,
varname.size = 5) + ylim(c(-4,4)) + xlim(c(-3,5))
```
### Extract the components and run all associations.
We are interested in if any of the likert/survey traits reflected by PCA is correlated with consumption.
It is a little complicated, but here goes
```{r}
library(broom)
library(broom.mixed)
library(lme4)
scores <- data.frame(Person = x$Person, PCAmdl$x[,1:2]) # take out the first two components.
tbmixed <- pasta %>%
left_join(scores, by = 'Person') %>%
gather(comp,score,PC1:PC2) %>%
group_by(StationName,comp) %>%
do(lmer(data = ., Consumption~score + Day + (1|Person)) %>% tidy(conf.int = T))
```
... Make a table and a plot of the results.
```{r}
library(knitr)
tbmixed %>%
filter(term=='score') %>%
dplyr::select(-effect,-group) %>%
kable(x = .,caption = 'Slopes according to components', digits = 2, format = 'simple')
tbmixed %>%
filter(term=='score') %>%
ggplot(data = ., aes(comp,estimate,ymin = conf.low, ymax = conf.high)) +
geom_errorbar(width = 0.1) +geom_point()+
geom_hline(yintercept = 0) +
facet_grid(~StationName) +
theme(legend.position = 'bottom')
```
Interpret the results.