-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path03-Exercises.Rmd
115 lines (63 loc) · 2.34 KB
/
03-Exercises.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
title: "Data wrangling exercises with data.table"
date: "2022-11-16"
output:
html_document:
toc: TRUE
df_print: paged
number_sections: FALSE
highlight: tango
theme: lumen
toc_depth: 3
toc_float: true
css: custom.css
self_contained: false
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
#### What are we going to work on?
These exercises provide you with an opportunity to apply everything you have learned about data.table. We cover all major tasks you typically encounter when wrangling data. Some of the exercises are purposefully designed to be somewhat more advanced and challenging.
But don't worry about the last part, by now all of you classify as R experts!
If you have any difficulties solving the exercises, you find an answer sheet in the repo...but try to do this on your own first. Also, often there exist more than just one solution, so be creative and connect data.table to your other R expertise (after all, data.table is really compatible).
Enjoy!
***
#### 1a.Load the data.table package and the nycflights13. As the nycflight13 package contains several datasets, we assign only those to objects we will later use. Your first task is to convert each dataset to data.table:
```{r}
pacman::p_load(data.table, nycflights13)
airlines_data <- airlines
flights_data <- flights
planes_data <- planes
```
#### 1b.Select rows 40 to 95 from flights_data:
```{r}
##
```
#### 2a.Select all rows from flights_data that have an air time lower than 100, then order the flights to show those with the longest air time first (decreasing order):
```{r}
##
```
#### 2b.What is the most frequent arrival time? (You might want to exclude the NAs from now :))
```{r}
##
```
#### 3a.Add a new column calculating the average air time for the flights from each origin:
```{r}
##
```
#### 3b.Using the %inbetween% operation, in planes_data check which manufacturer produces planes with a number of seats between 50-100, and then calculate the average number of seats across these planes:
```{r}
##
```
#### 4a.Find the number of flights that arrived on time?
```{r}
##
```
#### 4b.Find the most frequent departure time from the carriers in flights_data (careful with NAs!):
```{r}
##
```
#### 4c.Using .SD, find which carrier had the longest air time in flights_data:
```{r}
##
```