-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcm009.Rmd
83 lines (55 loc) · 2.06 KB
/
cm009.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
title: "cm009 Notes and Exercises: Table Joins"
date: '2017-10-03'
output: github_document
---
```{r}
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(gapminder))
```
After going through the `dplyr` [vignette](https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html) on "two-table verbs", we'll work on the following exercises.
Consider the following areas of countries, in hectares:
```{r}
(areas <- data.frame(country=c("Canada", "United States", "India", "Vatican City"),
area=c(998.5*10^6, 983.4*10^6, 328.7*10^6, 44)) %>%
as.tbl)
```
1. To the `gapminder` dataset, add an `area` variable using the `areas` tibble. Be sure to preserve all the rows of the original `gapminder` dataset.
```{r}
left_join(gapminder, areas)
```
2. To the `gapminder` dataset, add an `area` variable using the `areas` tibble, but only keeping obervations for which areas are available.
```{r}
inner_join(gapminder, areas)
```
3. Use a `_join` function to output the rows in `areas` corresponding to countries that _are not_ found in the `gapminder` dataset.
```{r}
anti_join(areas, gapminder)
```
4. Use a `_join` function to output the rows in `areas` corresponding to countries that _are_ found in the `gapminder` dataset.
```{r}
semi_join(areas, gapminder, by="country")
```
5. Construct a tibble that joins `gapminder` and `areas`, so that all rows found in each original tibble are also found in the final tibble.
```{r}
full_join(areas, gapminder)
full_join(gapminder, areas)
```
Here are the rows containing the Vatican City:
```{r}
full_join(areas, gapminder) %>%
filter(country=="Vatican City")
```
6. Subset the `gapminder` dataset to have countries that are only found in the `areas` data frame.
```{r}
semi_join(gapminder, areas)
```
7. Subset the `gapminder` dataset to have countries that are _not_ found in the `areas` data frame.
```{r}
anti_join(gapminder, areas)
```
Let's check.... Canada should not be in there:
```{r}
anti_join(gapminder, areas) %>%
filter(country=="Canada")
```