-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path1.1_R_Intro.qmd
504 lines (319 loc) · 13.3 KB
/
1.1_R_Intro.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
---
title: "1.1 Intro to R"
subtitle: |
| Workshop: "Handling Uncertainty in your Data"
|
| Dr. Mario Reutter & Juli Nagel
| (slides adapted from Dr. Lea Hildebrandt)
format:
revealjs:
smaller: true
scrollable: true
slide-number: true
theme: serif
chalkboard: true
width: 1280
height: 720
from: markdown+emoji
---
# R!
```{css}
code.sourceCode {
font-size: 1.4em;
}
div.cell-output-stdout {
font-size: 1.4em;
}
```
::: notes
start recording
:::
## General: Working with R in this course
- You should have RStudio open and your precision workshop project loaded (we will set up the project today).
- Have the slides open in the background - handy to copy `R` code (top right button if you hover over a code chunk) or click on links.
- If possible, use two screens with the slides (Zoom) opened on one and RStudio on the other
. . .
\
```{r}
#| echo: true
print("Hello World")
```
\
**Note:** You can navigate through the slides quickly by clicking on the three dashes in the bottom left.
## Why write code?
::: {.incremental .smaller}
- Doing statistical calculation by hand? Tedious & error prone! Computer is faster...
- Using spreadsheets? Limited options, change data accidentally...
- Using point-and-click software (e.g., SPSS)?
- proprietary software = expensive
- R = open, extensible (community)
- reproducible!
- Science/Academia is a marathon and not a sprint\
=\> it is worthwhile investing in skills with a slow learning curve that will pay off in the long run
:::
::: notes
Chat: What are advantages (or disadvantages!) of coding?
:::
## Why write code?
![](images/memes/automate.jpg){fig-align="center"}
## Managing Expectations
- You will learn a new (programming) language. Don't expect to "speak" it fluently right away.
- During the workshop, it is more important that you can roughly comprehend written code and "translate" it into natural language.
- The second step is to be able to make small adjustments to code that is given to you.
- Only then, the last step is to be able to produce code yourself (with the help of Google, Stackoverflow, templates of this course, etc. :) ).
- But: Use it or loose it! Don't wait to use `R` in your research projects until you're "good enough". It's more fun to use it on "actual" problems, and makes it much easier to learn.
## Install R & RStudio
You should all have <a href="https://www.r-project.org/">installed R</a> & <a href="https://posit.co/download/rstudio-desktop/">RStudio</a> by now! Who had problems doing so?
## Overview RStudio
![RStudio Interface](images/rstudio.png){fig-alt="Screenshot of the RStudio Interface with different panes visible"}
::: notes
open R!
:::
## RStudio Panes
::: columns
::: {.column .smaller width="40%"}
1. Script pane: view, edit, & save your code
2. Console: here the commands are run and rudimentary output may be provided
3. Environment: which variables/data are available
4. Files, plots, help etc.
:::
::: {.column width="10%"}
:::
::: {.column width="50%"}
![RStudio Interface](images/rstudio2.png)
:::
:::
::: notes
Console vs. Script (Rmarkdown later)
:::
## Using the Console as a Calculator
```{r}
#| echo: true
100 + 1
2*3
sqrt(9)
```
![Console used as calculator](images/console.png){fig-alt="Calculated 100+1, 2*3, square root of 9 directly in the console" width=60%}
::: notes
try it out!
We can't really do much with these values, they will just be written in the console.
Also: Notice that you have the option to include spaces are not between commands, i.e., `100 + 1` vs. `100+1`.
:::
## Saving the Results as a Variable/Object
```{r}
#| echo: true
a <- 100 + 1
multi <- 2*3
SqrtOfNine <- sqrt(9)
word <- "Hello"
```
<br>
::: {.incremental}
- `<-` is used to assign values to variables (`=` is also possible, but discouraged in `R`)
- `a`, `multi` etc. are the variable names <span style="font-size: 16px;">(some naming rules, e.g., no whitespace, must not start with an number, many special characters not allowed)</span>
- You can find those now in your Environment! (top right panel)
- No feedback in the console for saving variables (`2*3` outputs `6`, but `multi <- 2*3` doesn't)
- variables can contain basically anything (words, numbers, entire tables of data ...)
- the variables contain the calculated value (i.e. 101) and not the calculation/formula (100+1)
:::
::: notes
Type first command in console, what happens?
Why don't we see anything in the console?\
What happens if we type in `a` in the console?
Is there anything else that you find interesting?
What is sqrt()?
:::
## Working with variables
```{r}
#| echo: true
a + multi
a
multi
```
<br><br>
::: {.incremental}
- You can use those variables for further calculations, e.g., `a + multi`
- Note that neither `a` nor `multi` change their value.
:::
## Working with variables
```{r}
#| echo: true
a
a <- 42
a
```
<br><br>
- Variables can be overwritten (`R` won't warn you about this!)
## Functions
This code with `sqrt(9)` looked unfamiliar. `sqrt()` is an R function that calculates the square root of a number. `9` is the *argument* that we hand over to the function.
If you want to know what a function does, which arguments it takes, or which output it generates, you can type into the console: `?functionname`
<br><br>
```{r}
#| echo: true
#| eval: false
?sqrt
```
This will open the help file in the Help Pane on the lower right of RStudio.
You can also click on a function in the script or console pane and press the *F1* key.
<br><br>
Sometimes, the help page can be a bit overwhelming (lots of technical details etc.). It might help you to scroll down to the examples at the bottom to see the function in action!
::: notes
Do this now! Anything unclear?
:::
## Functions
Functions often take more than one argument (which have names):
```{r}
#| echo: true
#| eval: false
rnorm(n = 6, mean = 3, sd = 1)
rnorm(6, 3, 1) # this outputs the same as above
```
<br><br>
You can explicitly name your arguments (check the help file for the argument names!) or just state the values (but these have to be in the correct order then! See help file).
. . .
\
<br>
```{r}
#| echo: true
#| eval: false
rnorm(n = 6, mean = 3, sd = 1)
rnorm(6, 3, 1) # this outputs the same as above
rnorm(sd = 1, n = 6, mean = 3) # still the same result
rnorm(1, 6, 3) # different result - R thinks n = 1 and mean = 6!
```
## Comments
```{r}
#| echo: true
#| eval: false
rnorm(n = 6, mean = 3, sd = 1)
rnorm(6, 3, 1) # this outputs the same as above
# By the way, # denotes a comment - very important for documentation!
# Anything after # will be ignored by R
# To (un)comment the line you are in/multiple lines you selected: ctrl + shift + C
```
## Packages
There are a number of functions already included with *Base R* (i.e., `R` after a new installation), but you can greatly extend the power of `R` by loading packages (and we will!). Packages can e.g. contain collections of functions someone else wrote, or even data.
You should already have the `tidyverse` installed (if not, quickly run `install.packages("tidyverse")` :-) )
. . .
\
But installing is not enough to be able to actually use the functions from that package directly. Usually, you also want to load the package with the `library()` function. This is the first thing you do at the top of an `R` script:
```{r}
#| echo: true
#| eval: false
library("tidyverse") # or library(tidyverse)
```
. . .
\
(If you don't load a package, you have to call functions explicitly by `packagename::function`)
::: notes
Open Source! Anyone can write a package!
Base R = mobile phone, comes with some functions, packages = apps
possibly necessary to install Rtools!
:::
# Scripts & Projects
- If you type your code into the console (bottom left), it is not saved. Therefore, it is better practice to write scripts (top left) and save them as files.
- Scripts are basically text files that contain your code and can be run as needed.
- It makes sense to save all your scripts etc. in a folder specifically dedicated to this course.
- We will now create an `R` *project* together, which will help you to work with files that belong together.
## New Project
![](images/rstudioProject.png){width=200% fig-align="center"}
::: {.incremental}
- Create a new project by clicking on "*File*" on the top left and then "*New Project...*"
- Select "*New Directory*" <span style="font-size: 16px">(if you already have a folder for this course, you can choose "*Existing directory*" and select that folder)</span> and then choose "*New Project*" at the top of the list.
- Choose a project name, e.g., as "*r_workshop*" (this will create a folder in which the project lives)
- Browse where you want to put your project folder (in my case, "*C:/r_stuff/*")
:::
. . .
<span style="font-size: 16px">PS: `R` can deal with folder and file names that contain spaces, but since some programms can't, it's best practice not to use whitespaces for file/folder naming.</span>
## Existing Projects
You will find the current project on the top right corner of RStudio
If you click on the current project, you can open new projects by choosing "*Open Project*" and select the `.Rproj` file of the project.
You can also just double click on `.Rproj` files and RStudio will open with the project loaded.
![Existing projects](images/rstudioProject2.png)
## Why Projects
- Projects are not only convenient for us (e.g., scripts that we had opened before are re-opened when we open the project), they are also great for reproducibility.
- We won't cover the details here - see the "Further Reading" section of the course page!
## Using Scripts
To open a new script, click **File** $\to$ **New File** $\to$ **R Script.** (`Ctrl + Shift + N`)
To run a line of the script, you can either click *Run* at the top right of the pane or `Ctrl + Enter`. It will run the code that is highlighted/selected or automatically select the current line (or the complete multi-line command). \
To run the whole script/chunk, press `Ctrl + Shift + Enter` (with full console output) or `Ctrl + Shift + S` (limited output).
![Using scripts](images/rstudioScript.png)
```{=html}
<!-- ## Scripts 2
**Assignment**: Open a new file. In this file, write down some of the code (one command per line) that we have used so far and save the file.
Now run the code (either by pressing "run" at the top right of the script or `ctrl + enter`). -->
```
# Vectors
::: {.incremental}
- So far, we've worked with single values; *vectors* contain several elements.
:::
. . .
```{r}
#| echo: true
c(1, 7, 12, 4, 2)
c(2, 6.1, 9.234, 1.23)
c("hello", "cake", "biscuit")
```
::: {.incremental}
- Vectors always contain the same data type (It's a bit tricky, but can you see why `c(10, "biscuit", 2.31)` does not work?).
- Vectors are always wrapped in the `c()` function ("combine").
:::
## Working with vectors
- Of course, vectors can be stored in variables.
<br>
```{r}
#| echo: true
my_vector <- c(1, 2, 10)
shopping_list <- c("flour", "eggs", "apples")
```
## Vector operations
- But the real fun is that `R` is "vectorized", which allows us to do some funny tricks.
- Note that this is different from usual "vector math".
<br>
```{r}
#| echo: true
c(1, 2, 5) + 1
c(2, 4, 6) + c(1, 0, 2)
# Can you spot what happens HERE?!
c(2, 4, 6, 5, 0, 0) + c(1, 10)
```
# Working with real data
## Get the data
To read in data files, you need to know which format these files have, e.g. .txt. or .csv files or some other (proprietary) format. There are packages that enable you to read in data of different formats like Excel (.xlsx).
We will use the files from [Fundamentals of Quantitative Analysis](https://psyteachr.github.io/quant-fun-v2/starting-with-data.html): `ahi-cesd.csv` and `participant-info.csv`. Save these directly in your project folder on your computer (do not open them!).
. . .
\
Did you find the files? Here are the direct links:
1. <https://psyteachr.github.io/quant-fun-v2/ahi-cesd.csv>
2. <https://psyteachr.github.io/quant-fun-v2/participant-info.csv>
## Read in the data
Create a new script with the following content:
```{r}
#| echo: true
#| eval: false
library(tidyverse) # we will use a function from the tidyverse to read in the data
dat <- read_csv("ahi-cesd.csv")
pinfo <- read_csv("participant-info.csv")
```
Run the code!
## Looking at the Data
::: incremental
There are several options to get a glimpse at the data:
- Click on `dat` and `pinfo` in your Environment.
- Type `View(dat)` into the console or into the script pane and run it.
- Run `str(dat)` or `str(pinfo)` to get an overview of the data.
- Run `summary(dat)`.
- Run `head(dat)`, `print(dat)`, or even just `dat`.
- What is the difference between these commands?
:::
## Looking at the Data 2
What is the difference to the objects/variables, that you assigned/saved in your Environment earlier and these objects?
![RStudio's Environment panel](images/rstudioEnvironment.png)
. . .
The two objects we just read in are data frames, which are "tables" of data (they can contain entire data sets). The objects we assigned earlier were simpler (single values, or "one-dimensional" vectors).
Data frames usually have several rows and columns. The columns are the *variables* and the rows are the *observations* (more about that later).
# Questions?
This was the first chapter of this workshop! Do you have any questions?
\
Next:
- R: Data Wrangling