-
Notifications
You must be signed in to change notification settings - Fork 0
/
Lecture 8 Apply family.Rmd
114 lines (93 loc) · 2.61 KB
/
Lecture 8 Apply family.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
title: "Lecture 8 Applying functions"
author: David A McAllister
date: "Monday 31st October, 2016"
output:
slidy_presentation: default
---
## Objective of section
* To understand how apply functions work
* To use apply functions to avoid repetition
# "Don't repeat yourself" - Andy Hunt and Dave Thomas
## Use apply family of functions to avoid looping
* lapply - apply functions to lists
* sapply - a wrapper for lapply which simplifies the output as much as possible
* tapply - apply to a dataframe (generally) split by a factor
* apply - apply across rows and/or columns of a matrix-like object
* others like by, aggregate etc, but I don't use because I prefer dplyr. However, we will use `by` in the next but one exercise
## Why
* Makes code more readable
* It is not faster
* Makes code more transferrable, takes care of looping
# lapply and sapply
## lapply returns lists
```{r,}
# Works on a list and returns a list
lapply (cars, mean)
# sapply similar, but "s"implifies to a vector or matrix
sapply (cars, range)
```
## The apply functions accept extra arguments
```{r, echo = -(2)}
args (lapply)
cars.missing <- cars
cars.missing [1,1] <- NA
lapply (cars.missing, mean, na.rm = TRUE)
```
## If used with [] can replace values in dataframe
```{r}
MakeNAFunction <- function (x, cutpoint = 0.1, ...) {
ifelse (x<quantile(x,cutpoint), NA, x)
}
cars [] <- lapply (cars, MakeNAFunction, 0.05)
cars[1:4,]
data(cars)
```
# tapply runs function on cuts of data
## single value
```{r}
chickwts[1:3,]
# can return a vector or array
a <- tapply (chickwts$weight, chickwts$feed, length)
a
is.atomic(a)
```
## tapply can have multiple indices
```{r}
chickwts$water <- sample (c("hard", "soft"), length(chickwts$weight),
replace = TRUE)
chickwts[1:3,]
tapply (chickwts$weight, list(chickwts$feed, chickwts$water), mean)
```
## Multiple values
```{r}
b <- tapply (chickwts$weight, chickwts$feed, quantile)
str(b)
```
## Multiple values and do.call
```{r}
do.call (cbind, b)
```
but this is complex, why I often use plyr
# Apply
## Apply
- Works on a matrix
- Need to specify rows (dimension 1) or columns (dimension 2)
## Apply with single values
``` {r, echo = -2}
mymatrix <- matrix (1:4, nrow = 2)
mymatrix
apply (mymatrix, 1, mean)
apply (mymatrix, 1:2, mean)
```
## Apply with functions returning values length >1
```{r}
mymatrix <- matrix (rpois (100, 5), ncol = 5)
# can be useful where vector is returned
apply (mymatrix, 2, summary)
```
## Not much use with functions returning lists
```{r}
apply (mymatrix, 1:2, poisson.test)[1:5,]
```
# Which is one of the many motivations for writing your own functions