-
Notifications
You must be signed in to change notification settings - Fork 1
/
1-post.Rmd
85 lines (58 loc) · 6.4 KB
/
1-post.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
title: "Learning R With Education Datasets"
author: "Ryan Estrellado"
date: "5/21/2020"
output: github_document
---
Timothy Gallwey wrote in *The Inner Game of Tennis*:
> ...There is a natural learning process which operates within everyone, if it is allowed to. This process is waiting to be discovered by all those who do not know of its existence ... It can be discovered for yourself, if it hasn't been already. If it has been experienced, trust it.
Discovering a new R concept like a function or package is exciting. You never know if you're about to learn something that fundamentally changes the way you code or solve data science problems. But I get even more excited when I see somebody *use* new R concepts. For example, I learned about random forest models when I read about them in [An Introduction to Statistical Learning (ISL)](https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370). Then I imagined myself using them when I watched [Julia Silge fit a random forest model](https://youtu.be/LPptRkGoYMg) to predict attendance at NFL games. I need the reading to give me language for what I see data scientists do. Then I need to see what data scientists do for me to imagine myself doing what I’ve read.
Still, for most people using R in their jobs, there’s another step. They have to imagine how to apply what they’ve read and seen to the problems they're solving at work. But what if we used education datasets to help them imagine using R on the job, just as the authors of ISL use words and code to teach about models and Julia Silge uses video to inspire coding?
We learned from writing [*Data Science in Education Using R (DSIEUR)*](https://datascienceineducation.com) that we can combine words, code, and professional context. Professional context includes scenarios, language, and data that readers will recognize in their education jobs. We wanted readers to feel motivated and engaged by seeing words and data that reminds them of their everyday work tasks. This connection to their professional lives is a hook for readers as they engage R syntax which is, if you’ve never used it, literally a foreign language.
Let’s use `pivot_longer()` as an example. We'll describe this process in three steps: discovering the concept, seeing how the concept is used, and seeing how the concept is used *in education*.
**Step 1: See the concept**
When I read something like “Use `pivot_longer()` to transform a dataset from wide to long”, I can imagine the shape of a dataset changing. But it's harder to imagine what happens with the variables and their contents as the dataset's shape changes. I’ve been using R for over five years and I still struggle to visualize the contents of many columns rearranging themselves into one.
**Step 2: See how the concept is used**
The concept gets much clearer when you add an example---even one with little context---to the explanation. Here’s one from the `pivot_longer()` vignette, which you can view with `vignette("pivot")`:
```{r set options, echo=FALSE}
library(knitr)
opts_chunk$set(comment = "#>")
```
```{r packages, message=FALSE}
library(tidyverse)
```
```{r pivot vignette}
# Simplest case where column names are character data
relig_income
relig_income %>%
pivot_longer(-religion, names_to = "income", values_to = "count")
```
Sharing an idea by pairing an abstract programming concept with a reproducible example is a common practice for experienced R programmers. [Community guidelines for Stack Overflow posts](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and the [{reprex}](https://www.tidyverse.org/help/) package are two artifacts of a popular R community norm: help folks understand an idea by using words *and* code.
**Step 3: See how the concept is used in education**
Combining the explanation with a reproducible example makes `pivot_longer()` more concrete by showing how it works. What happens when we connect the explanation and reproducible example to the everyday work of a data scientist in education?
In [chapter seven](https://datascienceineducation.com/c07.html) of *DSIEUR*, we use `pivot_longer()` to transform a dataset of coursework survey responses from wide to long. Before using `pivot_longer()`, the dataset had a column for each survey question. When we use `pivot_longer()`, the name of each survey question moves to a new column called "question". Another new column is added, "response", which contains the corresponding response to each survey question.
To run this code, you'll need the *DSIEUR* companion R package, [{dataedu}](https://github.com/data-edu/dataedu):
```{r dataedu}
# Install the {dataedu} package if you don't have it
# devtools::install_github("data-edu/dataedu")
library(dataedu)
```
Here's the survey data in its original, wide format:
```{r DSIEUR wide}
# Wide format
pre_survey
```
The third through eighth columns are named after each survey question---"Q1MaincellgroupRow1", "Q1MaincellgroupRow2", "Q1MaincellgroupRow3", etc. These are the column names we'll be moving to a single column called "question" when the dataset transforms from wide to long.
Here's the new dataset, where a column called "question" contains the question names and a column called "response" contains the corresponding responses:
```{r DSIEUR long}
# Pivot the dataset from wide to long format
pre_survey %>%
pivot_longer(cols = Q1MaincellgroupRow1:Q1MaincellgroupRow10,
names_to = "question",
values_to = "response")
```
When you put it all together, the learning thought process is something like this:
- There’s a function called `pivot_longer()`, which turns a wide dataset into a long dataset
- `pivot_longer()` does this by putting multiple column names into its own column, then creating a new column that pairs each column name with a value
- I can use `pivot_longer()` to change an education survey dataset that has question names for columns into one that has a "question" column and a "response" column
We’ll be back with the next post in about two weeks. Until then, do share with us about the people and tools that inspire you to work on collaborative projects. You can reach us on Twitter: Emily [@ebovee09](https://twitter.com/ebovee09), Jesse [@kierisi](https://twitter.com/kierisi), Joshua [@jrosenberg6432](https://twitter.com/jrosenberg6432), Isabella [@ivelasq3](https://twitter.com/ivelasq3), and me [@RyanEs](https://twitter.com/RyanEs).