-
Notifications
You must be signed in to change notification settings - Fork 0
/
base-r.qmd
305 lines (193 loc) · 7.94 KB
/
base-r.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
# Base R
```{r, setup, include=FALSE}
exercise_echo <- TRUE
```
## Brief History of R
+ Created by Ross Ithaka and Robert Gentlemen of University of Auckland in 1993. It was derived from commercial **S** programming language (no kidding) which was created in 1976.
+ Version 1.0.0 is released in 2000. Current version is 4.1.1.
+ In 2017, CRAN (official package manager) had more than 10,000 packages. Today it has 18,214 packages on CRAN.
+ Ranked as the 9th most popular language in TIOBE index as of September 2021.
### Sources
+ <https://blog.revolutionanalytics.com/2020/07/the-history-of-r-updated-for-2020.html>
+ <https://en.wikipedia.org/wiki/R_(programming_language)>
+ <https://bookdown.org/rdpeng/rprogdatascience/history-and-overview-of-r.html>
+ <https://cran.r-project.org/web/packages/>
+ <https://www.tiobe.com/tiobe-index/>
## What is there to like R?
*(Personal opinions)*
+ One of the two most powerful scripting languages in data analysis with Python. ([Julia](https://julialang.org/), first released in 2012, is an emerging third.)
+ Syntax and style focused on more non-computer scientists. (Especially tidyverse)
+ Excellently curated and managed package manager (CRAN).
+ A powerhouse focused on data analytics. Many packages include implementations of novel research papers which cannot be found elsewhere.
+ Supported by a powerful IDE (RStudio).
+ Low learning curve for data analysis, visualization, publishing and interactive analysis.
**Note:** Python and R are not competitors. In many cases they complement each other. It is highly recommended to learn both.
## What are the disadvantages of R?
+ Not quite popular as Python in CS community. Support is lagging behind in some areas (especially in cloud computing) compared to Python.
+ Despite a very convenient web framework (shiny), not greatly suited for scalable web applications without heavy modifications. (Still a great start)
+ Parallel computing is not native in R. So, speed can be an issue.
+ R keeps data in-memory.
Each disadvantage can be alleviated using a package or a solution. Its benefits far outweigh its disadvantages.
class: inverse, center, middle
## Basic Features
+ R is a vector based language. When you call a function or do an operation, it is usually done for every member of the vector. (It is a powerful feature which requires some time to learn.)
+ Main data types are `numeric`, `character` and `logical`. But `factor`, `integer`, `date`, `dttm` (date-time) and some other types are also very common.
+ Main object types are `vector`, `matrix`, `data.frame` and `list`.
+ Assignment operators are "`<-`" and "`=`". Aside from rare exceptions, they are the same (`x <- 5` is the same as `x = 5`). Please be consistent in its use.
```{r}
x <- 5
x
```
+ R console is completely interactive. You can run anything line by line.
## Data Types
+ Numeric (`double`): 1.33, 5422.22...
+ There is also `integer`: 3, 5, 6...
+ Character (`character`): "a", "course", "pizza"...
+ Boolean (`logical`): Either `TRUE` or `FALSE`.
+ Date (`date`) and date-time (`dttm`): "`2020-07-28`", "`2020-07-29 14:00:05.12 UTC+3`"
+ This part is a bit complicated with POSIXct and POSIXt types.
+ Factor (`factor`): Numeric levels with labels of any kind.
+ Encountered rarely in this course.
## Object Types
### Vector
Vector is the foundation stone of R object types. A variable with a single value is called "atomic" vector.
Vectors with multiple values can be defined using `c()` ("combine") function.
```{r}
x <- c("a","b","c")
x
```
A vector can have only a single data type. R conveniently converts vectors to the most appropriate data type.
```{r}
x <- c(1,"hi",FALSE) ## Vector of numeric, character and logical values
x ## converted to all character
```
### Matrix
Matrix is simply a two dimensional special vector.
```{r}
mat1<-matrix(1:9, ncol=3, nrow=3)
mat1
```
We can get a value from a matrix by providing its location as row/column coordinates or by simply by treating it as a vector.
```{r}
mat1[2,2]
mat1[5]
```
### Data Frame
Data frame object type is still two dimensional but each column can be of a different data type.
```{r}
df1 <- data.frame(some_numbers=1:3,
some_names=c("Blood","Sweat","Tears"),
some_logical=c(TRUE,FALSE,TRUE))
df1
```
Data frames are extremely powerful structures. Most of our work will be on data frames.
**Note:** In `dplyr` package we will see a special version of data frames: `tibble`.
### List
Lists are like vectors but they can hold any object (including lists). You can also add names to lists.
```{r}
list1 <- list(data_frame = df1,matrix = mat1,vector= x)
list1
```
### Functions
Functions are very useful types as they allow to run reusable code with dynamic inputs. For example, let's write a function to calculate the area of a triangle.
```{r}
area_of_triangle <- function(height,base_length){
area <- height*base_length/2
return(area) ## Return value using return command
}
## You can assign the result of a function to a variable
x <- area_of_triangle(height = 3, base_length = 4)
x
```
+ Rule of thumb is "If you need to copy paste the same code three times, write a function instead."
+ R has thousands of predefined functions to make life easier.
+ If you want to return multiple values return a list.
class: center, middle, inverse
## Exercises
Complete base R document before attempting to solve these.
### Temperature Conversion
Write a function to convert Fahrenheit to Celsius and Celsius to Fahrenheit.
(X°C × 9/5) + 32 = Y°F
```{r,echo=exercise_echo}
convert_temperature <- function(x, F_to_C = TRUE){
if(F_to_C){
return((x-32)*5/9)
}else{
return(x*9/5 + 32)
}
}
```
```{r}
convert_temperature(30,F_to_C = FALSE)
convert_temperature(86,F_to_C = TRUE)
```
#### Future Value
Write a function to calculate the future value of an investment given annually compounding interest over an amount of years.
$$FV = X * (1 + i) ^T$$
```{r,echo=exercise_echo}
calculate_future_value <-
function(investment, interest, duration_in_years){
return(investment * ((1 + interest) ^ duration_in_years))
}
```
```{r}
## 100 units of investments 7% interest rate over 5 years
calculate_future_value(
investment = 100, interest = 0.07, duration_in_years = 5)
```
### Color Hex Code
Write a function to randomly generate n [color hex codes](https://www.color-hex.com/). You can use `letters` predefined vector.
```{r,echo=exercise_echo}
generate_hex_code <- function(n=1){
hex_vec <- c(0:9,letters[1:6])
colors <- c()
for(i in 1:n){
colors <- c(colors,
paste0("#",
paste0(sample(hex_vec,6,replace=TRUE),collapse="")))
}
return(colors)
}
```
```{r}
generate_hex_code(n=3)
```
### Calculate Probability of Dice
Write a function which calculates the probability of getting **k** sixes in **n** throws of a die. Hint: Use binomial distribution.
```{r,echo=exercise_echo}
get_prob_dice <- function(k,n){
combination <- factorial(n)/(factorial(k) * factorial(n-k))
probability <- (1/6)^k * (5/6)^(n-k)
return(combination*probability)
}
```
```{r}
get_prob_dice(3,5)
```
```{r,echo=exercise_echo,eval=exercise_echo}
dbinom(3,5,prob=1/6) ## or simply use dbinom
```
### Rock, Scissors, Paper
Write a rock scissors paper game which computer randomly chooses
```{r,echo=exercise_echo}
rsp_game <- function(user,choices=c("rock","scissors","paper")){
if(!(user %in% choices))
return("Choose only rock, scissors or paper as input.")
response <- sample(choices,1)
if(user == response)
return("I chose the same. Tie!")
if((user == "rock" & response == "scissors") |
(user == "scissors" & response == "paper") |
(user == "paper" & response == "rock")){
return(paste0("I chose ", response, ". You win!"))
}else{
return(paste0("I chose ", response, ". You lose!"))
}
}
```
```{r}
rsp_game("rock")
```
Check course webpage for more exercises!
## Other Links
+ [Base R Cheat Sheet](http://github.com/rstudio/cheatsheets/blob/main/base-r.pdf)