diff --git a/tutorial.R b/tutorial.R
index 90f6a1c..077ffd9 100644
--- a/tutorial.R
+++ b/tutorial.R
@@ -122,11 +122,11 @@ x <- 1.9999999999999; x; x-2
x <- 1.99999999999999999; x; x-2
a <- 1; b <- 3;
-c <- a < b
-d <- (a > b)
-c; d
+c <- a < b; c
+d <- (a > b); d
-x <- 1:5; b <- (x<=3); b
+x <- 1:5
+b <- (x<=3); b
a=1:3
b=2:4
@@ -137,12 +137,12 @@ a==b
a <- c(1,2,3,4)
b <- c(1,1,5,5)
(a3)
-(a3)
+## a3)
course.url <- "https://kingaa.github.io/R_Tutorial/"
X <- read.csv(paste0(course.url,"data/ChlorellaGrowth.csv"),comment.char='#')
Light <- X[,1]
-rmax <- X[,2];
+rmax <- X[,2]
lowLight <- Light[Light<50]
lowLightrmax <- rmax[Light<50]
lowLight
@@ -190,7 +190,8 @@ y <- 1:5; y
z <- array(1:5,dim=5); z
y==z
identical(y,z)
-dim(y); dim(z)
+dim(y)
+dim(z)
## x <- seq(1,27)
## dim(x) <- c(3,9)
diff --git a/tutorial.Rmd b/tutorial.Rmd
index b591d25..68aae14 100644
--- a/tutorial.Rmd
+++ b/tutorial.Rmd
@@ -795,9 +795,8 @@ For example, try:
```{r }
a <- 1; b <- 3;
-c <- a < b
-d <- (a > b)
-c; d
+c <- a < b; c
+d <- (a > b); d
```
The parentheses around `a > b` above are optional but do make the code easier to read.
@@ -825,7 +824,8 @@ Table: Some comparison operators in **R**. Use `?Comparison` to learn more.
When we compare two vectors or matrices, comparisons are done element-by-element (and the recycling rule applies).
For example,
```{r }
-x <- 1:5; b <- (x<=3); b
+x <- 1:5
+b <- (x<=3); b
```
So if `x` and `y` are vectors, then `(x==y)` will return a vector of values giving the element-by-element comparisons.
If you want to know whether `x` and `y` are identical vectors, use `identical(x,y)` or `all.equal(x,y)`.
@@ -862,14 +862,15 @@ Table: Logical operators.
-----------------------------------
-For example, try
-```{r warning=TRUE,error=TRUE}
+For example, execute the following and make sure you understand what happens.
+```{r warning=TRUE}
a <- c(1,2,3,4)
b <- c(1,1,5,5)
(a3)
-(a3)
```
-and make sure you understand what happened.
+```{r eval=FALSE}
+a3)
+```
The two forms of logical OR (`|`and `||`) are *inclusive*, meaning that `x|y` is true if either `x` or `y` or both are true.
Use `xor` when exclusive OR is needed.
@@ -887,7 +888,7 @@ As a simple example, we might want to focus on just the low-light values of $r_{
course.url <- "https://kingaa.github.io/R_Tutorial/"
X <- read.csv(paste0(course.url,"data/ChlorellaGrowth.csv"),comment.char='#')
Light <- X[,1]
-rmax <- X[,2];
+rmax <- X[,2]
```
```{r }
lowLight <- Light[Light<50]
@@ -1110,7 +1111,8 @@ y <- 1:5; y
z <- array(1:5,dim=5); z
y==z
identical(y,z)
-dim(y); dim(z)
+dim(y)
+dim(z)
```
-----------------------------------
diff --git a/tutorial.html b/tutorial.html
index e0db8b7..5d6e61e 100644
--- a/tutorial.html
+++ b/tutorial.html
@@ -2867,7 +2867,7 @@ Aaron A. King, Stu Field, Ben Bolker, Steve Ellner
How to use this document
These notes contain many sample calculations. It is important to do these yourself—type them in at your keyboard and see what happens on your screen—to get the feel of working in R.
Exercises in the middle of a section should be done immediately when you get to them, and make sure you have them right before moving on.
-For your convenience, the R codes for this document are provided in a script which you can download, edit, and run.
+For your convenience, the R codes for this document are provided in a script which you can download, edit, and run.
Many other similar introductions are scattered around the web; a partial list is in the “contributed documentation” section on the R web site (https://cran.r-project.org/other-docs.html). This particular version is limited (it has similar coverage to the standard Introduction to R manual and targets biologists who are neither programmers nor statisticians (yet).
Throughout this document, Windows-specific items are marked with a Ⓦ and *nix (linux/unix/MacOSX)-specific items with a Ⓧ.
@@ -3621,13 +3621,12 @@Some operations return a logical value (i.e., TRUE
or FALSE
). For example, try:
1; b <- 3;
- a <- a < b
- c <- (a > b)
- d <- c; d
output
[1] TRUE
(a > b); d d <-
output
[1] FALSE
@@ -3679,21 +3678,22 @@ When we compare two vectors or matrices, comparisons are done element-by-element (and the recycling rule applies). For example,
- 1:5; b <- (x<=3); b x <-
1:5
+ x <- (x<=3); b b <-
output
[1] TRUE TRUE TRUE FALSE FALSE
So if x
and y
are vectors, then (x==y)
will return a vector of values giving the element-by-element comparisons. If you want to know whether x
and y
are identical vectors, use identical(x,y)
or all.equal(x,y)
. You can use ?Logical
to read more about logical operations. Note the difference between = and ==. Can you figure out what happened in the following cautionary tale?
1:3
- a=2:4
- b===b a
1:3
+ a=2:4
+ b===b a
output
[1] FALSE FALSE FALSE
- a=b==b a
+ a=b==b a
output
[1] TRUE TRUE TRUE
@@ -3737,17 +3737,15 @@ For example, try
- c(1,2,3,4)
- a <- c(1,1,5,5)
- b <-<b) | (a>3) (a
For example, execute the following and make sure you understand what happens.
+ c(1,2,3,4)
+ a <- c(1,1,5,5)
+ b <-<b) | (a>3) (a
output
[1] FALSE FALSE TRUE TRUE
<b) || (a>3) (a
Error in (a < b) || (a > 3): 'length = 4' in coercion to 'logical(1)'
-and make sure you understand what happened.
+<b) || (a>3) a
The two forms of logical OR (|
and ||
) are inclusive, meaning that x|y
is true if either x
or y
or both are true. Use xor
when exclusive OR is needed. The two forms of AND and OR differ in how they handle vectors. The shorter one (|
, &
) does element-by-element comparisons; the longer one (||
, &&
) looks only at the first element in each vector.
output
[1] FALSE
-dim(y); dim(z)
dim(y)
output
NULL
dim(z)
output
[1] 5
@@ -4074,10 +4073,10 @@ What happens when we set the dimension attribute on a vector? For example:
- seq(1,27)
- x <-dim(x) <- c(3,9)
-is.array(x)
-is.matrix(x)
seq(1,27)
+ x <-dim(x) <- c(3,9)
+is.array(x)
+is.matrix(x)
Make sure you understand the above.
For dealing with measurements on the nominal and ordinal scales (Stevens 1946), R provides vectors of type factor. A factor is a variable that can take one of a finite number of distinct levels. To construct a factor, we can apply the factor
function to a vector of any class:
rep(c(1,2),each=3); factor(x) x <-
rep(c(1,2),each=3); factor(x) x <-
output
[1] 1 1 1 2 2 2
Levels: 1 2
c("jetpack","ferret","pizza","lawyer")
- trochee <- factor(trochee); trochee trochee <-
c("jetpack","ferret","pizza","lawyer")
+ trochee <- factor(trochee); trochee trochee <-
output
[1] jetpack ferret pizza lawyer
Levels: ferret jetpack lawyer pizza
By default, factor
sets the levels to the unique set of values taken by the vector. To modify that behavior, there is the levels
argument:
factor(trochee,levels=c("ferret","pizza","cowboy","scrapple"))
factor(trochee,levels=c("ferret","pizza","cowboy","scrapple"))
output
[1] <NA> ferret pizza <NA>
@@ -4107,15 +4106,15 @@ Factors
Note that the order of the levels is arbitrary, in keeping with the fact that the only operation permissible on the nominal scale is the test for equality. In particular, the factors created with the factor
command are un-ordered: there is no sense in which we can ask whether, e.g., ferret < cowboy
.
To represent variables measured on the ordinal scale, R provides ordered factors, constructed via the ordered
function. An ordered factor is just like an un-ordered factor except that the order of the levels matters:
ordered(sample(x=letters,size=22,replace=TRUE)); x x <-
ordered(sample(x=letters,size=22,replace=TRUE)); x x <-
output
[1] z m u p a g c d b n x d g n w l c t z p b x
14 Levels: a < b < c < d < g < l < m < n < p < t < ... < z
Here, we’ve relied on ordered
’s default behavior, which is to put the levels in alphabetical order. It’s typically safer to specify explicitly what order we want:
ordered(x,levels=rev(letters))
- x <-1:5] < x[18:22] x[
ordered(x,levels=rev(letters))
+ x <-1:5] < x[18:22] x[
output
[1] TRUE FALSE TRUE TRUE FALSE
@@ -4140,9 +4139,9 @@ While vectors and matrices may seem familiar, lists may be new to you. Vectors and matrices have to contain elements that are all the same type: lists in R can contain anything—vectors, matrices, other lists, arbitrary objects. Indexing is a little different too: use [[ ]]
to extract an element of a list by number or name or $
to extract an element by name (only). Given a list like this:
list(A=x,B=trochee,C=c("a","b","c")) L <-
list(A=x,B=trochee,C=c("a","b","c")) L <-
Then L$A
, L[["A"]]
, and L[[1]]
will each return the first element of the list. To extract a sublist, use the ordinary single square brackets: []
:
c("B","C")] L[
c("B","C")] L[
output
$B
@@ -4159,9 +4158,9 @@ Data frames
Data frames are a hybrid of lists and vectors. Internally, they are a list of vectors which can be of different types but must all be the same length. However, they behave somewhat like matrices, in that you can do most things to them that you can do with matrices. You can index them either the way you would index a list, using [[ ]]
or $
—where each variable is a different item in the list—or the way you would index a matrix.
You can turn a data frame into a matrix (using as.matrix()
, but only if all variables are of the same class) and a matrix into a data frame (using as.data.frame()
).
When data are read into R from an external file using one of the read.xxx
commands (read.csv
, read.table
, read.xls
, etc.), the object that is created is a data frame.
- "https://kingaa.github.io/R_Tutorial/data/ChlorellaGrowth.csv"
- data.url <- read.csv(data.url,comment.char='#')
- dat <- dat
+ "https://kingaa.github.io/R_Tutorial/data/ChlorellaGrowth.csv"
+ data.url <- read.csv(data.url,comment.char='#')
+ dat <- dat
output
light rmax
@@ -4414,19 +4413,19 @@ Scripts and data files
If you send script files by e-mail, even if you are careful to send them as plain text, lines will occasionally get broken in different places, which can lead to confusion. Beware.
As a first example, the file Intro1.R
has the commands from the interactive regression analysis. Download Intro1.R
and save it to your computer:
- "https://kingaa.github.io/R_Tutorial/"
- course.url <-download.file(paste0(course.url,"Intro1.R"),destfile="Intro1.R",mode="w")
+ "https://kingaa.github.io/R_Tutorial/"
+ course.url <-download.file(paste0(course.url,"Intro1.R"),destfile="Intro1.R",mode="w")
Open your copy of Intro1.R
. In your editor, select and Copy the entire text of the file, and then Paste the text into the R console window. This has the same effect as entering the commands by hand into the console: they will be executed and so a graph is displayed with the results. Cut-and-Paste allows you to execute script files one piece at a time (which is useful for finding and fixing errors). The source
function allows you to run an entire script file, e.g.,
-source("Intro1.R")
+source("Intro1.R")
Another important time-saver is loading data from a text file. Grab copies of Intro2.R
and ChlorellaGrowth.csv
from the dropsite to see how this is done.
-download.file(paste0(course.url,"Intro2.R"),destfile="Intro2.R",mode="w")
-download.file(paste0(course.url,"data/ChlorellaGrowth.csv"),destfile="ChlorellaGrowth.csv",mode="w")
+download.file(paste0(course.url,"Intro2.R"),destfile="Intro2.R",mode="w")
+download.file(paste0(course.url,"data/ChlorellaGrowth.csv"),destfile="ChlorellaGrowth.csv",mode="w")
In ChlorellaGrowth.csv
the two variables are entered as columns of a data matrix. Then instead of typing these in by hand, the command
- read.csv("ChlorellaGrowth.csv",comment.char='#') X <-
+ read.csv("ChlorellaGrowth.csv",comment.char='#') X <-
reads the file (from the current directory) and puts the data values into the variable X
. Note that as specified above you need to make sure that R is looking for the data file in the right place: make sure to download the data file into your current working directory. Note also the comment.char
option in the call to read.csv
; this tells R to ignore lines that begin with a #
and allows us to use self-documenting data files (a very good practice).
Extract the variables from X
with the commands
- X[,1]
- Light <- X[,2] rmax <-
+ X[,1]
+ Light <- X[,2] rmax <-
Think of these as shorthand for “Light
= everything in column 1 of X
”, and “rmax
= everything in column 2 of X
” (we’ll learn about working with data frames later). From there on out it’s the same as before, with some additions that set the axis labels and add a title.
@@ -4474,27 +4473,27 @@ Looping in R
for
loops
Execute the following code.
- 1
- phi <-for (k in 1:100) {
- 1+1/phi
- phi <-print(c(k,phi))
- }
+ 1
+ phi <-for (k in 1:100) {
+ 1+1/phi
+ phi <-print(c(k,phi))
+ }
What does it do? Sequentially, for each value of k
between 1 and 100, phi
is modified. More specifically, at the beginning of the for
loop, a vector containing all the integers from 1 to 100 in order is created. Then, k
is set to the first element in that vector, i.e., 1. Then the R expression from the {
to the }
is evaluated. When that expression has been evaluated, k
is set to the next value in the vector. The process is repeated until, at the last evaluation, k
has value 100.
As an aside, note that the final value of phi
is the Golden Ratio, 1.618034.
As an example of a situation where a loop of some sort is really needed, suppose we wish to iterate the Beverton-Holt map (one of the simplest discrete-time models of population growth), \[\begin{equation*}
N_{t+1} = \frac{a\,N_t}{1+b\,N_t}.
\end{equation*}\] We simply have no option but to do the calculation one step at a time. Here’s an R code that does this
- 1.1
- a <- 0.001
- b <- seq(from=1,to=200,by=1)
- T <- numeric(length(T))
- N <- 2
- n <-for (t in T) {
- a*n/(1+b*n)
- n <- n
- N[t] <- }
+ 1.1
+ a <- 0.001
+ b <- seq(from=1,to=200,by=1)
+ T <- numeric(length(T))
+ N <- 2
+ n <-for (t in T) {
+ a*n/(1+b*n)
+ n <- n
+ N[t] <- }
Spend some time to make sure you understand what happens at each line of the above. We can plot the population sizes \(N_t\) through time via
-plot(T,N)
+plot(T,N)
plot
![]()
@@ -4502,11 +4501,11 @@ for
loops
Gotcha:
An alternative way to do the above might be something like
- numeric(length(T))
- N <-for (t in 1:length(T)) {
- a*n/(1+b*n)
- n <- n
- N[t] <- }
+ numeric(length(T))
+ N <-for (t in 1:length(T)) {
+ a*n/(1+b*n)
+ n <- n
+ N[t] <- }
@@ -4518,41 +4517,41 @@ Exercise
while
loops
A second looping construct is the while
loop. Using while
, we can compute the Golden Ratio as before:
- 20
- phi <- 1
- k <-while (k <= 100) {
- 1+1/phi
- phi <-print(c(k,phi))
- k+1
- k <- }
-What’s going on here? First, phi
and k
are initialized. Then the while
loop is started. At each iteration of the loop, phi
is modified, and intermediate results printed, as before. In addition, k
is incremented. The while
loop continues to iterate until the condition k <= 100
is no longer TRUE
, at which point, the while
loop terminates.
-Note that here we’ve chosen a large number (100) of iterations. Perhaps we could get by with fewer. If we wanted to terminate the iterations as soon as the value of phi
stopped changing, we could do:
20
- phi <- FALSE
- conv <-while (!conv) {
- 1+1/phi
- phi.new <- phi==phi.new
- conv <- phi.new
+ phi <- 1
+ k <-while (k <= 100) {
+ 1+1/phi
+ phi <-print(c(k,phi))
+ k+1
k <- }
+What’s going on here? First, phi
and k
are initialized. Then the while
loop is started. At each iteration of the loop, phi
is modified, and intermediate results printed, as before. In addition, k
is incremented. The while
loop continues to iterate until the condition k <= 100
is no longer TRUE
, at which point, the while
loop terminates.
+Note that here we’ve chosen a large number (100) of iterations. Perhaps we could get by with fewer. If we wanted to terminate the iterations as soon as the value of phi
stopped changing, we could do:
+ 20
+ phi <- FALSE
+ conv <-while (!conv) {
+ 1+1/phi
+ phi.new <- phi==phi.new
+ conv <- phi.new
+ phi <- }
Exercise
Verify that the above works as intended. How many iterations are needed?
Another way to accomplish this would be to use break
to stop the iteration when a condition is met. For example
- 20
- phi <-while (TRUE) {
- 1+1/phi
- phi.new <-if (phi==phi.new) break
- phi.new
- phi <- }
-While this while
loop is equivalent to the one before, it does have the drawback that, if the break
condition is never met, the loop will go on indefinitely. An alternative that avoids this is to use a for
loop with a large (but finite) number of iterations, together with break
:
- 3
- phi <-for (k in seq_len(1000)) {
+ 20
+ phi <-while (TRUE) {
1+1/phi
phi.new <-if (phi==phi.new) break
phi.new
phi <- }
+While this while
loop is equivalent to the one before, it does have the drawback that, if the break
condition is never met, the loop will go on indefinitely. An alternative that avoids this is to use a for
loop with a large (but finite) number of iterations, together with break
:
+ 3
+ phi <-for (k in seq_len(1000)) {
+ 1+1/phi
+ phi.new <-if (phi==phi.new) break
+ phi.new
+ phi <- }
@@ -4564,12 +4563,12 @@ Exercise
repeat
loops
A third looping construct in R involves the repeat
keyword. For example,
- 12
- phi <-repeat {
- 1/(1+phi)
- phi.new <-if (phi==phi.new) break
- phi.new
- phi <- }
+ 12
+ phi <-repeat {
+ 1/(1+phi)
+ phi.new <-if (phi==phi.new) break
+ phi.new
+ phi <- }
In addition, R provides the next
keyword, which, like break
, is used in the body of a looping construct. Rather than terminating the iteration, however, it aborts the current iteration and leads to the immediate execution of the next iteration.
For more information on looping and other control-flow constructs, execute ?Control
.
@@ -4580,9 +4579,9 @@ Functions and environments
Definitions and examples
An extremely useful feature in R is the ability to write arbitrary functions. A function, in this context, is an algorithm that performs a specific computation that depends on inputs (the function’s arguments) and produces some output (the function’s value) and/or has some side effects. Let’s see how this is done.
Here is a function that squares a number.
- function (x) x^2 sq <-
+ function (x) x^2 sq <-
The syntax is function (arglist) expr
. The one argument in this case is x
. When a particular value of x
is supplied, R performs the squaring operation. The function then returns the value x^2
:
-sq(3); sq(9); sq(-2);
+sq(3); sq(9); sq(-2);
output
[1] 9
@@ -4596,12 +4595,12 @@ Definitions and examples
[1] 4
Here is a function with two arguments and a more complex body, as we call the expression that is evaluated when the function is called.
- function (x, y = 3) {
- f <- sq(x)
- a <-+y
- a }
+ function (x, y = 3) {
+ f <- sq(x)
+ a <-+y
+ a }
Here, the body is the R expression from {
to }
. Unless the return
codeword is used elsewhere in the function body, the value returned is always the last expression evaluated. Thus:
-f(3,0); f(2,2); f(3);
+f(3,0); f(2,2); f(3);
output
[1] 9
@@ -4616,23 +4615,23 @@ Definitions and examples
Note that in the last case, only one argument was supplied. In this case, y
assumed its default value, 3.
Note that functions need not be assigned to symbols; they can be anonymous:
-function (x) x^5
+function (x) x^5
output
function (x) x^5
-function (x) x^5)(2) (
+function (x) x^5)(2) (
output
[1] 32
A function can also have side effects, e.g.,
- "hat"
- hat <- function (y) {
- hattrick <- "rabbit"
- hat <<-2*y
-
- }hattrick(5); hat hat;
+ "hat"
+ hat <- function (y) {
+ hattrick <- "rabbit"
+ hat <<-2*y
+
+ }hattrick(5); hat hat;
output
[1] "hat"
@@ -4649,12 +4648,12 @@ Definitions and examples
An aside
If we want the function not to automatically print, we can wrap the return value in invisible()
:
- function (y) {
- hattrick <- "rabbit"
- hat <<-invisible(2*y)
-
- }hattrick(5)
-print(hattrick(5))
+ function (y) {
+ hattrick <- "rabbit"
+ hat <<-invisible(2*y)
+
+ }hattrick(5)
+print(hattrick(5))
output
[1] 10
@@ -4666,12 +4665,12 @@ An aside
its environment, i.e., the context in which the function was defined.
R provides simple functions to interrogate these function components:
-formals(hattrick)
+formals(hattrick)
output
$y
-body(hattrick)
+body(hattrick)
output
{
@@ -4679,7 +4678,7 @@ An aside
invisible(2 * y)
}
-environment(hattrick)
+environment(hattrick)
output
<environment: R_GlobalEnv>
@@ -4691,17 +4690,17 @@ Function scope
Note: This section draws heavily, and sometimes verbatim, on §10.7 of the Introduction to R manual (R Core Team 2014a).
As noted above, a paramount consideration in the implementation of functions in any programming language is that unintentional side effects should never occur. In particular, I should be free to write a function that creates temporary variables as an aid to its computations, and be able to rest assured that no variables I create temporarily will interfere with any other variables I’ve defined anywhere else. To accomplish this, R has a specific set of scoping rules.
Consider the function
- function (x) {
- f <- 2*x
- y <-print(x)
- print(y)
- print(z)
- }
+ function (x) {
+ f <- 2*x
+ y <-print(x)
+ print(y)
+ print(z)
+ }
In this function’s body, x
is a formal parameter, y
is a local variable, and z
is a free, or unbound variable. When f
is evaluated, each of these variables must be bound to some value. In R , the free variable bindings are resolved—each time the function is evaluated—by first looking in the environment where the function was created. This is called lexical scope. Thus if we execute
-f(3)
+f(3)
we get an error, because no object named z
can be found. If, however, we do
- 10
- z <-f(3)
+ 10
+ z <-f(3)
output
[1] 3
@@ -4709,15 +4708,15 @@ Function scope
[1] 10
we don’t get an error, because z
is defined in the environment, <environment: R_GlobalEnv>
, of f
. Similarly, when we do
- 13
- z <- function (x) {
- g <-2*x+z
-
- } function (x) {
- f <- -100
- z <-g(x)
-
- }f(5)
+ 13
+ z <- function (x) {
+ g <-2*x+z
+
+ } function (x) {
+ f <- -100
+ z <-g(x)
+
+ }f(5)
output
[1] 23
@@ -4728,12 +4727,12 @@ Function scope
Nested functions and environments
In each of the following examples, make sure you understand exactly what has happened.
Consider this:
- 11
- y <- function (x) {
- f <- 2*x
- y <-+x
- y
- }f(1); y
+ 11
+ y <- function (x) {
+ f <- 2*x
+ y <-+x
+ y
+ }f(1); y
output
[1] 3
@@ -4748,11 +4747,11 @@ Exercise
Why hasn’t y
changed its value?
What about this?
- 0
- y <- function (x) {
- f <-2*x+y
-
- }f(1); y
+ 0
+ y <- function (x) {
+ f <-2*x+y
+
+ }f(1); y
output
[1] 2
@@ -4768,14 +4767,14 @@ Exercise
Why is the return value of f
different?
How about the following?
- function (y) y
- g <- function (x) {
- f <- function (y) {
- g <-2*y
-
- }11+g(x)
-
- }f(2); g(2)
+ function (y) y
+ g <- function (x) {
+ f <- function (y) {
+ g <-2*y
+
+ }11+g(x)
+
+ }f(2); g(2)
output
[1] 15
@@ -4791,15 +4790,15 @@ Exercise
Be sure you understand what’s happening at each line and why you get the results you see.
Another example:
- 11
- y <- function (x) {
- f <- 22
- y <- function (x) {
- g <-+y
- x
- }g(x)
-
- }f(1); y
+ 11
+ y <- function (x) {
+ f <- 22
+ y <- function (x) {
+ g <-+y
+ x
+ }g(x)
+
+ }f(1); y
output
[1] 23
@@ -4815,16 +4814,16 @@ Exercise
Which value of y
was used?
Compare that with this:
- 11
- y <- function (x) {
- f <- 22
- y <- function (x) {
- g <- 7
- y <<-+y
- x
- }c(g(x),y)
-
- }f(1); y
+ 11
+ y <- function (x) {
+ f <- 22
+ y <- function (x) {
+ g <- 7
+ y <<-+y
+ x
+ }c(g(x),y)
+
+ }f(1); y
output
[1] 8 7
@@ -4834,14 +4833,14 @@ Exercise
[1] 11
and this:
- 11
- y <- function (x) {
- f <- function (x) {
- g <-+y
- x
- }g(x)
-
- }f(1); y
+ 11
+ y <- function (x) {
+ f <- function (x) {
+ g <-+y
+ x
+ }g(x)
+
+ }f(1); y
output
[1] 12
@@ -4851,17 +4850,17 @@ Exercise
[1] 11
and this:
- function (x) {
- f <- 37
- y <- function (x) {
- g <- function (x) {
- h <-+y
- x
- }h(x)
-
- }g(x)
-
- }f(1); y
+ function (x) {
+ f <- 37
+ y <- function (x) {
+ g <- function (x) {
+ h <-+y
+ x
+ }h(x)
+
+ }g(x)
+
+ }f(1); y
output
[1] 38
@@ -4871,15 +4870,15 @@ Exercise
[1] 11
Finally, consider the following:
-rm(y)
- function (x) {
- f <- function (x) {
- g <- 2*x
- y <<-+1
- y
- }g(x)+1
-
- }f(11); y
+rm(y)
+ function (x) {
+ f <- function (x) {
+ g <- 2*x
+ y <<-+1
+ y
+ }g(x)+1
+
+ }f(11); y
output
[1] 24
@@ -4888,7 +4887,7 @@ Exercise
output
[1] 22
-f(-2); y
+f(-2); y
output
[1] -2
@@ -4904,60 +4903,60 @@ Exercise
In the above five code snippets, make sure you understand the differences.
As mentioned above, each function is associated with an environment: the environment within which it was defined. When a function is evaluated, a new temporary environment is created, within which the function’s calculations are performed. Every new environment has a parent, the environment wherein it was created. The parent of this new environment is the function’s environment. To see this, try
- function () {
- f <- function () {
- g <- function () {
- h <-cat("inside function h:\n")
- cat("current env: ")
- print(environment())
- cat("parent env: ")
- print(parent.frame(1))
- cat("grandparent env: ")
- print(parent.frame(2))
- cat("great-grandparent env: ")
- print(parent.frame(3))
- invisible(NULL)
-
- }cat("inside function g:\n")
- cat("environment of h: ")
- print(environment(h))
- cat("current env: ")
- print(environment())
- cat("parent env: ")
- print(parent.frame(1))
- cat("grandparent env: ")
- print(parent.frame(2))
- h()
-
- }cat("inside function f:\n")
- cat("environment of g: ")
- print(environment(g))
- cat("current env: ")
- print(environment())
- cat("parent env: ")
- print(parent.frame(1))
- g()
-
- }cat("environment of f: "); print(environment(f))
-cat("global env: "); print(environment())
-f()
+ function () {
+ f <- function () {
+ g <- function () {
+ h <-cat("inside function h:\n")
+ cat("current env: ")
+ print(environment())
+ cat("parent env: ")
+ print(parent.frame(1))
+ cat("grandparent env: ")
+ print(parent.frame(2))
+ cat("great-grandparent env: ")
+ print(parent.frame(3))
+ invisible(NULL)
+
+ }cat("inside function g:\n")
+ cat("environment of h: ")
+ print(environment(h))
+ cat("current env: ")
+ print(environment())
+ cat("parent env: ")
+ print(parent.frame(1))
+ cat("grandparent env: ")
+ print(parent.frame(2))
+ h()
+
+ }cat("inside function f:\n")
+ cat("environment of g: ")
+ print(environment(g))
+ cat("current env: ")
+ print(environment())
+ cat("parent env: ")
+ print(parent.frame(1))
+ g()
+
+ }cat("environment of f: "); print(environment(f))
+cat("global env: "); print(environment())
+f()
output
environment of f: <environment: R_GlobalEnv>
global env: <environment: R_GlobalEnv>
inside function f:
-environment of g: <environment: 0x55ffbfb892b0>
-current env: <environment: 0x55ffbfb892b0>
+environment of g: <environment: 0x55b266ba0018>
+current env: <environment: 0x55b266ba0018>
parent env: <environment: R_GlobalEnv>
inside function g:
-environment of h: <environment: 0x55ffbce116f8>
-current env: <environment: 0x55ffbce116f8>
-parent env: <environment: 0x55ffbfb892b0>
+environment of h: <environment: 0x55b2632c3e20>
+current env: <environment: 0x55b2632c3e20>
+parent env: <environment: 0x55b266ba0018>
grandparent env: <environment: R_GlobalEnv>
inside function h:
-current env: <environment: 0x55ffbce1bb90>
-parent env: <environment: 0x55ffbce116f8>
-grandparent env: <environment: 0x55ffbfb892b0>
+current env: <environment: 0x55b2632b2aa8>
+parent env: <environment: 0x55b2632c3e20>
+grandparent env: <environment: 0x55b266ba0018>
great-grandparent env: <environment: R_GlobalEnv>
Each variable referenced in the function’s body is bound, first, to a formal argument if possible. If a local variable of that name has previously been created (via one of the assignment operators <-
, ->
, or =
, this is the variable that is affected by any subsequent assignments. If the variable is neither a formal parameter nor a local variable, then the parent environment of the function is searched for that variable. If the variable has not been found in the parent environment, then the grand-parent environment is searched, and so on.
@@ -4971,9 +4970,9 @@ The apply
family of functions
List apply: lapply
lapply
applies a function to each element of a list or vector, returning a list.
- list("teenage","mutant","ninja","turtle",
- x <-"hamster","plumber","pickle","baby")
- lapply(x,nchar)
+ list("teenage","mutant","ninja","turtle",
+ x <-"hamster","plumber","pickle","baby")
+ lapply(x,nchar)
output
[[1]]
@@ -5000,9 +4999,9 @@ List apply: lapply
[[8]]
[1] 4
- c("teenage","mutant","ninja","turtle",
- y <-"hamster","plumber","pickle","baby")
- lapply(y,nchar)
+ c("teenage","mutant","ninja","turtle",
+ y <-"hamster","plumber","pickle","baby")
+ lapply(y,nchar)
output
[[1]]
@@ -5033,16 +5032,16 @@ List apply: lapply
Sloppy list apply: sapply
sapply
isn’t content to always return a list: it attempts to simplify the results into a non-list vector if possible.
- list("pizza","monster","jigsaw","puddle",
- x <-"hamster","plumber","pickle","baby")
- sapply(x,nchar)
+ list("pizza","monster","jigsaw","puddle",
+ x <-"hamster","plumber","pickle","baby")
+ sapply(x,nchar)
output
[1] 5 7 6 6 7 7 6 4
- c("pizza","monster","jigsaw","puddle",
- y <-"hamster","plumber","pickle","baby")
- sapply(y,nchar)
+ c("pizza","monster","jigsaw","puddle",
+ y <-"hamster","plumber","pickle","baby")
+ sapply(y,nchar)
output
pizza monster jigsaw puddle hamster plumber pickle
@@ -5054,9 +5053,9 @@ Sloppy list apply: sapply
Multiple-list apply: mapply
mapply
is a multiple-argument version of sapply
:
- c("pizza","monster","jigsaw","puddle")
- x <- c("cowboy","barbie","slumber","party")
- y <-mapply(paste,x,y,sep="/")
+ c("pizza","monster","jigsaw","puddle")
+ x <- c("cowboy","barbie","slumber","party")
+ y <-mapply(paste,x,y,sep="/")
output
pizza monster jigsaw
@@ -5065,7 +5064,7 @@ Multiple-list apply: mapply
"puddle/party"
As usual, the recycling rule applies:
-mapply(paste,x,y[2:3])
+mapply(paste,x,y[2:3])
output
pizza monster jigsaw
@@ -5073,7 +5072,7 @@ Multiple-list apply: mapply
puddle
"puddle slumber"
-mapply(paste,x[c(1,3)],y)
+mapply(paste,x[c(1,3)],y)
output
pizza jigsaw <NA>
@@ -5085,7 +5084,7 @@ Multiple-list apply: mapply
Array apply: apply
apply
is very powerful and a bit more complex. It allows an arbitrary function to applied to each slice of an array, where the slices can be defined in all possible ways. Let’s create a matrix:
- array(data=seq_len(15),dim=c(3,5)); A A <-
+ array(data=seq_len(15),dim=c(3,5)); A A <-
output
[,1] [,2] [,3] [,4] [,5]
@@ -5094,19 +5093,19 @@ Array apply: apply
[3,] 3 6 9 12 15
To apply an operation to each row, we marginalize over the first dimension (rows). For example, to sum the rows, we’d do
-apply(A,1,sum)
+apply(A,1,sum)
output
[1] 35 40 45
To sum the columns (the second dimension), we’d do
-apply(A,2,sum)
+apply(A,2,sum)
output
[1] 6 15 24 33 42
Now suppose we have a 3-dimensional array:
- array(data=seq_len(30),dim=c(3,5,2)); A A <-
+ array(data=seq_len(30),dim=c(3,5,2)); A A <-
output
, , 1
@@ -5124,7 +5123,7 @@ Array apply: apply
[3,] 18 21 24 27 30
To sum the rows within each slice, we’d do
-apply(A,c(1,3),sum)
+apply(A,c(1,3),sum)
output
[,1] [,2]
@@ -5133,14 +5132,14 @@ Array apply: apply
[3,] 45 120
while to sum the slices, we’d do
-apply(A,3,sum)
+apply(A,3,sum)
output
[1] 120 345
For each of the above, make sure you understand exactly what has happened.
Of course, we can apply an anonymous function wherever we apply a named function:
-apply(A,c(2,3),function (x) sd(x)/sqrt(length(x)))
+apply(A,c(2,3),function (x) sd(x)/sqrt(length(x)))
output
[,1] [,2]
@@ -5151,7 +5150,7 @@ Array apply: apply
[5,] 0.5773503 0.5773503
If any additional arguments are given to apply
, these are passed to the function. For example, make sure you understand what is happening in the lines:
-apply(A,c(1,2),function (x, y) sum(x>y),y=8)
+apply(A,c(1,2),function (x, y) sum(x>y),y=8)
output
[,1] [,2] [,3] [,4] [,5]
@@ -5159,7 +5158,7 @@ Array apply: apply
[2,] 1 1 1 2 2
[3,] 1 1 2 2 2
-apply(A,c(1,2),function (x, y) sum(x>y),y=-1)
+apply(A,c(1,2),function (x, y) sum(x>y),y=-1)
output
[,1] [,2] [,3] [,4] [,5]
@@ -5171,9 +5170,9 @@ Array apply: apply
Table apply: tapply
tapply
is, in a way, an extension of table
. The syntax is tapply(X,INDEX,FUN,...)
, where X
is a vector, INDEX
is a list of one or more factors, each the same length as X
, and FUN
is a function. The vector X
will be split into subvectors according to INDEX
, and FUN
will be applied to each of the subvectors. By default, the result is simplified into an array if possible. Some examples:
- seq(1,30,by=1)
- x <- rep(letters[1:10],times=3)
- b <-data.frame(x,b)
+ seq(1,30,by=1)
+ x <- rep(letters[1:10],times=3)
+ b <-data.frame(x,b)
output
x b
@@ -5208,14 +5207,14 @@ Table apply: tapply
29 i
30 j
-tapply(x,b,sum)
+tapply(x,b,sum)
output
a b c d e f g h i j
33 36 39 42 45 48 51 54 57 60
- rep(letters[1:10],each=3)
- b <-data.frame(x,b)
+ rep(letters[1:10],each=3)
+ b <-data.frame(x,b)
output
x b
@@ -5250,18 +5249,18 @@ Table apply: tapply
29 j
30 j
-tapply(x,b,sum)
+tapply(x,b,sum)
output
a b c d e f g h i j
6 15 24 33 42 51 60 69 78 87
The following uses the seed predation data described by Bolker (2008).
- "https://kingaa.github.io/R_Tutorial/data/seedpred.dat"
- datafile <- read.table(datafile,header=TRUE,
- seeds <-colClasses=c(station='factor',dist='factor',date='Date'))
- subset(seeds,available>0)
- x <-with(x, tapply(tcum,list(dist,station),max,na.rm=TRUE))
+ "https://kingaa.github.io/R_Tutorial/data/seedpred.dat"
+ datafile <- read.table(datafile,header=TRUE,
+ seeds <-colClasses=c(station='factor',dist='factor',date='Date'))
+ subset(seeds,available>0)
+ x <-with(x, tapply(tcum,list(dist,station),max,na.rm=TRUE))
output
1 10 100 101 102 103 104 105 106 107 108 109 11 110
@@ -5302,14 +5301,14 @@ Table apply: tapply
sapply with expected result: vapply
When we could use sapply
and we know exactly what the size and class of the value of the function will be, it is sometimes faster to use vapply
. The syntax is like that of sapply
: vapply(X,FUN,FUN.VALUE,...)
, where X
and FUN
are as in sapply
, but we specify the size and class of the value of FUN
via the FUN.VALUE
argument. For example, suppose we define a function that, given a number between 1 and 26 will return the corresponding letter of the alphabet:
- function (x) {
- alph <-stopifnot(x >= 1 && x <= 26)
- as.integer(x)]
- LETTERS[ }
+ function (x) {
+ alph <-stopifnot(x >= 1 && x <= 26)
+ as.integer(x)]
+ LETTERS[ }
This function will return a vector of length 1 and class character
. To apply it to a randomly sampled set of integers, we might do
- sample(1:26,50,replace=TRUE)
- x <- vapply(x,alph,character(1))
- y <-paste(y,collapse="")
+ sample(1:26,50,replace=TRUE)
+ x <- vapply(x,alph,character(1))
+ y <-paste(y,collapse="")
output
[1] "ORIFWKBWAMEJBZUMLKRJVEOIKQXEYRNTGAXXXUVQTZIHQGANTI"
@@ -5321,30 +5320,30 @@ sapply with expected result: vapply
Vectorized functions vs loops
As Ligges & Fox (Ligges and Fox 2008) point out, the idea that one should avoid loops wherever possible in R, using instead vectorized functions like those in the apply
family, is quite widespread in some quarters. The belief, which probably dates back to infelicities in early versions of S and Splus but is remarkably persistent, is that loops are very slow in R. Let’s have a critical look at this.
Consider the following loop code that can be vectorized:
- runif(n=1e6,min=0,max=2*pi)
- x <- numeric(length(x))
- y <-for (k in seq_along(x)) {
- sin(x[k])
- y[k] <- }
-To time this, we can wrap the vectorizable parts in a call to system.time
:
runif(n=1e6,min=0,max=2*pi)
- x <-system.time({
- numeric(length(x))
- y <-for (k in seq_along(x)) {
- sin(x[k])
- y[k] <-
- } })
+ numeric(length(x))
+ y <-for (k in seq_along(x)) {
+ sin(x[k])
+ y[k] <- }
+To time this, we can wrap the vectorizable parts in a call to system.time
:
+ runif(n=1e6,min=0,max=2*pi)
+ x <-system.time({
+ numeric(length(x))
+ y <-for (k in seq_along(x)) {
+ sin(x[k])
+ y[k] <-
+ } })
output
user system elapsed
- 0.092 0.000 0.093
+ 0.095 0.000 0.095
We can compare this with a simple call to sin
(which is vectorized):
-system.time(z <- sin(x))
+system.time(z <- sin(x))
output
user system elapsed
- 0.020 0.000 0.021
+ 0.023 0.000 0.023
Clearly, calling sin
directly is much faster. What about using sapply
?
@@ -5353,55 +5352,55 @@ Exercise
Compare the time spent on the equivalent calculation, using sapply
. What does this result tell you about where the computational cost is incurred?
The above example is very simple in that there is a builtin function (sin
in this case) which is capable of the fully vectorized computation. In such a case, it is clearly preferable to use it. Frequently, however, no such builtin function exists, i.e., we have a custom function of our own we want to apply to a set of data. Let’s compare the relative speeds of loops and sapply
in this case.
- seq.int(from=20,to=1e6,by=10)
- x <- function (x) {
- f <-+1)*x+1)*x+1)*x+1
- (((x
- }system.time({
- numeric(length(x))
- res1 <-for (k in seq_along(x)) {
- f(x[k])
- res1[k] <-
- } })
+ seq.int(from=20,to=1e6,by=10)
+ x <- function (x) {
+ f <-+1)*x+1)*x+1)*x+1
+ (((x
+ }system.time({
+ numeric(length(x))
+ res1 <-for (k in seq_along(x)) {
+ f(x[k])
+ res1[k] <-
+ } })
output
user system elapsed
- 0.069 0.000 0.069
+ 0.071 0.000 0.071
-system.time(res2 <- sapply(x,f))
+system.time(res2 <- sapply(x,f))
output
user system elapsed
- 0.072 0.000 0.071
+ 0.078 0.001 0.078
[Actually, in this case, f
is vectorized automatically. Why is this?]
-system.time(f(x))
+system.time(f(x))
output
user system elapsed
- 0.000 0.000 0.001
+ 0.001 0.000 0.000
Another example: in this case function g
is not vectorized.
- function (x) {
- g <-if ((x[1] > 30) && (x[1] < 5000)) 1 else 0
-
- }
-system.time({
- numeric(length(x))
- res1 <-for (k in seq_along(x)) {
- g(x[k])
- res1[k] <-
- } })
+ function (x) {
+ g <-if ((x[1] > 30) && (x[1] < 5000)) 1 else 0
+
+ }
+system.time({
+ numeric(length(x))
+ res1 <-for (k in seq_along(x)) {
+ g(x[k])
+ res1[k] <-
+ } })
output
user system elapsed
- 0.066 0.000 0.065
+ 0.068 0.000 0.069
-system.time(res2 <- sapply(x,g))
+system.time(res2 <- sapply(x,g))
output
user system elapsed
- 0.07 0.00 0.07
+ 0.075 0.000 0.075