02-names-and-values.Rmd

---
title: "Chapter 2: Names and Values"
output: html_document
---

## 2.2 Binding Basics 

### Exercise 2.2.1

Explain the relationship between a, b, c and d in the following code:

```{r, eval = FALSE}
a <- 1:10
b <- a
c <- b
d <- 1:10
```

The first line `r a <- 1:10` creates the vector 
`c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)` and assigns the name `a` to it.

The same vector is the assigned to name `b` and name `c`. Note that `a`, `b`, 
and `c` have the same memory location.

The last line creates a second vector `c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)` 
and assigns the name `d` to it. In this case `d` has a different address than 
`a`, `b`, and `c`.

```{r}
a <- 1:10
b <- a
c <- b
d <- 1:10

lobstr::obj_addr(a) == lobstr::obj_addr(b)
lobstr::obj_addr(a) == lobstr::obj_addr(c)
lobstr::obj_addr(a) == lobstr::obj_addr(d)
```

### Exercise 2.2.2

The following code accesses the `mean` function in multiple ways. Do they all 
point to the same underlying function object? Verify this with 
`lobstr::obj_addr()`.

```{r, eval=FALSE}
mean
base::mean
get("mean")
evalq(mean)
match.fun("mean")
```

All methods of accessing a function point to the same object:

```{r}
#https://stackoverflow.com/questions/30850400/using-identical-in-r-with-multiple-vectors
a <- lobstr::obj_addr(mean)
b <- lobstr::obj_addr(base::mean)
c <- lobstr::obj_addr(get("mean"))
d <- lobstr::obj_addr(evalq(mean))
e <- lobstr::obj_addr(match.fun("mean"))

all(sapply(list(b, c, d, e), FUN = identical, a))
```

### Exercise 2.2.3

By default, base R data import functions, like `read.csv()`, will automatically 
convert non-syntactic names to syntactic ones. Why might this be problematic? 
What option allows you to suppress this behaviour?

By default, `read.csv()` will use the `make.names()` function to create 
syntactic column names.

```{r}
testfile <- tempfile(fileext = ".csv")
cat("_abc", "1", "2", "3", file = testfile, sep = "\n")

read.csv(testfile)
```

You can suppress this behaviour by setting the `check.names` argument to 
`FALSE`.

```{r}
read.csv(testfile, check.names = FALSE)
```

### Exericse 2.2.4

What rules does `make.names()` use to convert non-syntactic names into 
syntactic ones?

From the function documentation, in the `make.names()` function:

* The character "X" is prepended if necessary. 
* All invalid characters are translated to ".". 
* A missing value is translated to "NA". 
* Names which match R keywords have a dot appended to them. 
* Duplicated values are altered by make.unique.

For example: 

```{r}
nonsyntactic <- c("_abc", "with space", "if", "with-space")
make.names(nonsyntactic, unique = TRUE)
```

### Exercise 2.2.5

I slightly simplified the rules that govern syntactic names. Why is `.123e1` not 
a syntactic name? Read `?make.names` for the full details.

From the documentation:

"A syntactically valid name consists of letters, numbers and the dot or 
underline characters and starts with a letter or the **dot not followed by a 
number**."

```{r}
make.names(".123e1")
```

## 2.3 Copy-on-modify

### Exericse 2.3.1

Why is `tracemem(1:10)` not useful?

Just calling `1:10` doesn't create a binding to the vector object, so R doesn't 
have anything to trace after `tracemem(1:10)` is called.

### Exercise 2.3.2

Explain why tracemem() shows two copies when you run this code. Hint: carefully 
look at the difference between this code and the code shown earlier in the 
section.

```{r, eval = FALSE}
x <- c(1L, 2L, 3L)
tracemem(x)

x[[3]] <- 4
```

```{r}
x <- c(1, 2, 3)
tracemem(x)

x[[3]] <- 4L
```

### Exercise 2.3.3

Sketch out the relationship between the following objects:

```{r}
a <- 1:10
b <- list(a, a)
c <- list(b, a, 1:10)
```

The first line creates the vector object `1:10` and binds name `a` to it.

The second line creates the list object with items `(a, a)` and binds name `b` 
to it. The address of the items in `b` is the same as the address of `a`.

The third line creates the list object with items `(b, a, 1:10)` and binds 
name `c` to it.

```{r}
lobstr::obj_addr(a)

lobstr::obj_addr(b)
lobstr::obj_addrs(b)

lobstr::obj_addr(c)
lobstr::obj_addrs(c)
lobstr::obj_addrs(c[[1]])
```

### Exercise 2.3.4

What happens when you run this code?

```{r}
x <- list(1:10)
lobstr::obj_addr(x)


x[[2]] <- x
```

```{r}
lobstr::obj_addr(x)
lobstr::obj_addrs(x)
```

## 2.4 Object Size

### Exercise 2.4.1 

In the following example, why are `object.size(y)` and `obj_size(y)` so 
radically different? Consult the documentation of `object.size()`.

```{r, eval = FALSE}
y <- rep(list(runif(1e4)), 100)

object.size(y)
#> 8005648 bytes
lobstr::obj_size(y)
#> 80,896 B
```

According to the documentation, `object.size()` "...does not detect if elements 
of a list are shared". In this case `object.size()` is duplicating the 
reference to `y`, explaining why the results for `object.size()` is roughly 
100 times larger than that of `lobstr::obj_size()`. Note the size of one 
item from `y`:

```{r}
y <- rep(list(runif(1e4)), 100)

lobstr::obj_size(y[[1]])
```


### Exercise 2.4.2

Take the following list. Why is its size somewhat misleading?

```{r}
funs <- list(mean, sd, var)
lobstr::obj_size(funs)
```

```{r}
funs <- list(mean, sd, var)
lobstr::obj_size(funs)
```

### Exercise 2.4.3

Predict the output of the following code:

```{r, eval = FALSE}
a <- runif(1e6)
obj_size(a)

b <- list(a, a)
obj_size(b)
obj_size(a, b)

b[[1]][[1]] <- 10
obj_size(b)
obj_size(a, b)

b[[2]][[1]] <- 10
obj_size(b)
obj_size(a, b)
```

Double vectors occupy 8 bytes per element. The vector `a` should occupy 
8 * 1,000,000 = 8,000,000 bytes. 

```{r}
a <- runif(1e6)
lobstr::obj_size(a)
```

Where do the extra 48 bytes come from? Note that an empty vector in R will 
occupy 48 bytes of memory:

```{r}
lobstr::obj_size(vector("numeric"))
```

Since `b` references `a` twice, we expect it to occupy 8,000,048 bytes, plus the 
overhead for a empty `list` of length 2, which is 64 bytes.

```{r}
b <- list(a, a)

lobstr::obj_size(b)
lobstr::obj_size(b) - lobstr::obj_size(vector("list", 2))
```

Since `b` and `a` reference the same object, the amount of memory required 
remains the same.

```{r}
lobstr::obj_size(a, b)
```

When the first item of `b` is modified a copy is created. We expect `b` to now 
occupy 8,000,048 * 2 bytes, plus the 64 bytes of overhead for the list. Since 
the second item of `b` still references `a`, the overall memory useage 
remains the same.

```{r}
b[[1]][[1]] <- 10
lobstr::obj_size(b)
lobstr::obj_size(a, b)
```

When the second item of `b` is modified, another copy is created. We expect `b` 
to still occupy 16,000,160 bytes. Since neither item of `b` references `a` 
anymore, the overall memory usage will be 16,000,160 + 8,000,048 bytes.

```{r}
b[[2]][[1]] <- 10
lobstr::obj_size(b)
lobstr::obj_size(a, b)
```