-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path02-names-and-values.Rmd
316 lines (225 loc) · 6.63 KB
/
02-names-and-values.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
---
title: "Chapter 2: Names and Values"
output: html_document
---
## 2.2 Binding Basics
### Exercise 2.2.1
Explain the relationship between a, b, c and d in the following code:
```{r, eval = FALSE}
a <- 1:10
b <- a
c <- b
d <- 1:10
```
The first line `r a <- 1:10` creates the vector
`c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)` and assigns the name `a` to it.
The same vector is the assigned to name `b` and name `c`. Note that `a`, `b`,
and `c` have the same memory location.
The last line creates a second vector `c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)`
and assigns the name `d` to it. In this case `d` has a different address than
`a`, `b`, and `c`.
```{r}
a <- 1:10
b <- a
c <- b
d <- 1:10
lobstr::obj_addr(a) == lobstr::obj_addr(b)
lobstr::obj_addr(a) == lobstr::obj_addr(c)
lobstr::obj_addr(a) == lobstr::obj_addr(d)
```
### Exercise 2.2.2
The following code accesses the `mean` function in multiple ways. Do they all
point to the same underlying function object? Verify this with
`lobstr::obj_addr()`.
```{r, eval=FALSE}
mean
base::mean
get("mean")
evalq(mean)
match.fun("mean")
```
All methods of accessing a function point to the same object:
```{r}
#https://stackoverflow.com/questions/30850400/using-identical-in-r-with-multiple-vectors
a <- lobstr::obj_addr(mean)
b <- lobstr::obj_addr(base::mean)
c <- lobstr::obj_addr(get("mean"))
d <- lobstr::obj_addr(evalq(mean))
e <- lobstr::obj_addr(match.fun("mean"))
all(sapply(list(b, c, d, e), FUN = identical, a))
```
### Exercise 2.2.3
By default, base R data import functions, like `read.csv()`, will automatically
convert non-syntactic names to syntactic ones. Why might this be problematic?
What option allows you to suppress this behaviour?
By default, `read.csv()` will use the `make.names()` function to create
syntactic column names.
```{r}
testfile <- tempfile(fileext = ".csv")
cat("_abc", "1", "2", "3", file = testfile, sep = "\n")
read.csv(testfile)
```
You can suppress this behaviour by setting the `check.names` argument to
`FALSE`.
```{r}
read.csv(testfile, check.names = FALSE)
```
### Exericse 2.2.4
What rules does `make.names()` use to convert non-syntactic names into
syntactic ones?
From the function documentation, in the `make.names()` function:
* The character "X" is prepended if necessary.
* All invalid characters are translated to ".".
* A missing value is translated to "NA".
* Names which match R keywords have a dot appended to them.
* Duplicated values are altered by make.unique.
For example:
```{r}
nonsyntactic <- c("_abc", "with space", "if", "with-space")
make.names(nonsyntactic, unique = TRUE)
```
### Exercise 2.2.5
I slightly simplified the rules that govern syntactic names. Why is `.123e1` not
a syntactic name? Read `?make.names` for the full details.
From the documentation:
"A syntactically valid name consists of letters, numbers and the dot or
underline characters and starts with a letter or the **dot not followed by a
number**."
```{r}
make.names(".123e1")
```
## 2.3 Copy-on-modify
### Exericse 2.3.1
Why is `tracemem(1:10)` not useful?
Just calling `1:10` doesn't create a binding to the vector object, so R doesn't
have anything to trace after `tracemem(1:10)` is called.
### Exercise 2.3.2
Explain why tracemem() shows two copies when you run this code. Hint: carefully
look at the difference between this code and the code shown earlier in the
section.
```{r, eval = FALSE}
x <- c(1L, 2L, 3L)
tracemem(x)
x[[3]] <- 4
```
```{r}
x <- c(1, 2, 3)
tracemem(x)
x[[3]] <- 4L
```
### Exercise 2.3.3
Sketch out the relationship between the following objects:
```{r}
a <- 1:10
b <- list(a, a)
c <- list(b, a, 1:10)
```
The first line creates the vector object `1:10` and binds name `a` to it.
The second line creates the list object with items `(a, a)` and binds name `b`
to it. The address of the items in `b` is the same as the address of `a`.
The third line creates the list object with items `(b, a, 1:10)` and binds
name `c` to it.
```{r}
lobstr::obj_addr(a)
lobstr::obj_addr(b)
lobstr::obj_addrs(b)
lobstr::obj_addr(c)
lobstr::obj_addrs(c)
lobstr::obj_addrs(c[[1]])
```
### Exercise 2.3.4
What happens when you run this code?
```{r}
x <- list(1:10)
lobstr::obj_addr(x)
x[[2]] <- x
```
```{r}
lobstr::obj_addr(x)
lobstr::obj_addrs(x)
```
## 2.4 Object Size
### Exercise 2.4.1
In the following example, why are `object.size(y)` and `obj_size(y)` so
radically different? Consult the documentation of `object.size()`.
```{r, eval = FALSE}
y <- rep(list(runif(1e4)), 100)
object.size(y)
#> 8005648 bytes
lobstr::obj_size(y)
#> 80,896 B
```
According to the documentation, `object.size()` "...does not detect if elements
of a list are shared". In this case `object.size()` is duplicating the
reference to `y`, explaining why the results for `object.size()` is roughly
100 times larger than that of `lobstr::obj_size()`. Note the size of one
item from `y`:
```{r}
y <- rep(list(runif(1e4)), 100)
lobstr::obj_size(y[[1]])
```
### Exercise 2.4.2
Take the following list. Why is its size somewhat misleading?
```{r}
funs <- list(mean, sd, var)
lobstr::obj_size(funs)
```
```{r}
funs <- list(mean, sd, var)
lobstr::obj_size(funs)
```
### Exercise 2.4.3
Predict the output of the following code:
```{r, eval = FALSE}
a <- runif(1e6)
obj_size(a)
b <- list(a, a)
obj_size(b)
obj_size(a, b)
b[[1]][[1]] <- 10
obj_size(b)
obj_size(a, b)
b[[2]][[1]] <- 10
obj_size(b)
obj_size(a, b)
```
Double vectors occupy 8 bytes per element. The vector `a` should occupy
8 * 1,000,000 = 8,000,000 bytes.
```{r}
a <- runif(1e6)
lobstr::obj_size(a)
```
Where do the extra 48 bytes come from? Note that an empty vector in R will
occupy 48 bytes of memory:
```{r}
lobstr::obj_size(vector("numeric"))
```
Since `b` references `a` twice, we expect it to occupy 8,000,048 bytes, plus the
overhead for a empty `list` of length 2, which is 64 bytes.
```{r}
b <- list(a, a)
lobstr::obj_size(b)
lobstr::obj_size(b) - lobstr::obj_size(vector("list", 2))
```
Since `b` and `a` reference the same object, the amount of memory required
remains the same.
```{r}
lobstr::obj_size(a, b)
```
When the first item of `b` is modified a copy is created. We expect `b` to now
occupy 8,000,048 * 2 bytes, plus the 64 bytes of overhead for the list. Since
the second item of `b` still references `a`, the overall memory useage
remains the same.
```{r}
b[[1]][[1]] <- 10
lobstr::obj_size(b)
lobstr::obj_size(a, b)
```
When the second item of `b` is modified, another copy is created. We expect `b`
to still occupy 16,000,160 bytes. Since neither item of `b` references `a`
anymore, the overall memory usage will be 16,000,160 + 8,000,048 bytes.
```{r}
b[[2]][[1]] <- 10
lobstr::obj_size(b)
lobstr::obj_size(a, b)
```