07b_Pasta_mushroom_legume_PCA.Rmd


## PCA on survey answers

We use the ggbiplot package for plotting the PCA model (See [Biplot] for details). 

```{r}
library(data4consumerscience)
library(tidyverse)
library(ggbiplot)
```


### Wrangle data

PCA takes numerical data as input, so we use the likert-scales in the form of 1 to 7. 
Further the yes/no answers are included, and also needs to be changed.


```{r}
data('pasta')
x <- pasta %>% 
  mutate(Did_you_take_food_from_both_Dish1_and_Dish2 = 
           Did_you_take_food_from_both_Dish1_and_Dish2 %>% factor %>% as.numeric(),
         Did_you_consider_the_proteincontent_of_the_dishes_you_choose = 
           Did_you_consider_the_proteincontent_of_the_dishes_you_choose %>% 
           factor() %>% as.numeric()) %>% 
  mutate_if(is.factor, as.numeric) %>% 
  filter(Day==1) %>%  # the survey part is the same for both days and both stations. That is what we keep. 
  filter(str_detect(StationName,'leg'))
```

### Build the model

```{r}
PCAmdl <- prcomp(x[,c(5:6,8:11)],scale. = T)
```

### Bi-plot 

And a plot of the model

```{r,fig.height=6}
ggbiplot(PCAmdl, varname.size = 5) + ylim(c(-4,4)) + xlim(c(-2,5))
```

What does component 1 (PC1) reflect? What does PC2 reflect?

Lets plot the model and color the samples according to the consumption (of legumes) cutted at the median.

```{r,fig.height=6}
ggbiplot(PCAmdl, groups = factor(x$Consumption>130), ellipse = T,
         varname.size = 5) + ylim(c(-4,4)) + xlim(c(-3,5))
```

### Extract the components and run all associations. 

We are interested in if any of the likert/survey traits reflected by PCA is correlated with consumption. 
It is a little complicated, but here goes

```{r}
library(broom)
library(broom.mixed)
library(lme4)

scores <- data.frame(Person = x$Person, PCAmdl$x[,1:2]) # take out the first two components. 
tbmixed <- pasta %>% 
  left_join(scores, by = 'Person') %>% 
  gather(comp,score,PC1:PC2) %>% 
  group_by(StationName,comp) %>% 
  do(lmer(data = ., Consumption~score + Day + (1|Person)) %>% tidy(conf.int = T))
```

... Make a table and a plot of the results. 

```{r}
library(knitr)
tbmixed %>% 
  filter(term=='score') %>%
  dplyr::select(-effect,-group) %>% 
  kable(x = .,caption = 'Slopes according to components', digits = 2, format = 'simple')

tbmixed %>% 
  filter(term=='score') %>% 
  ggplot(data = ., aes(comp,estimate,ymin = conf.low, ymax = conf.high)) + 
  geom_errorbar(width = 0.1) +geom_point()+ 
  geom_hline(yintercept = 0) + 
  facet_grid(~StationName) + 
  theme(legend.position = 'bottom')
```

Interpret the results.