-
Notifications
You must be signed in to change notification settings - Fork 0
/
janitor_R_package.Rmd
97 lines (74 loc) · 2.4 KB
/
janitor_R_package.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
title: "The {janitor} R package"
subtitle: "Simple Tools for Examining and Cleaning Dirty Data"
author: "R Workshop"
output:
prettydoc::html_pretty:
theme: cayman
highlight: github
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
require(prettydoc)
require(tidyverse)
require(janitor)
```
### Janitor
Janitor are all package contains many functions that extend and enhance the functionality of ***tidy verse***. We will have a look at some of the commands in this section
### Installation
<pre><code>
install.packages("janitor")
library(janitor)
</code></pre>
#### Examples
```{r}
mtcars %>%
tabyl(am, cyl) %>%
adorn_percentages("col")
# calculates correctly even with totals column and/or row:
```
```{r}
mtcars %>%
tabyl(am, cyl) %>%
adorn_totals("row") %>%
adorn_percentages()
```
## Clean data.frame names with ``clean_names()``
Call this function every time you read data.
It works in a %>% pipeline, and handles problematic variable names, especially those that are so well-preserved
by ``readxl::read_excel()`` and ``readr::read_csv()``.
* Parses letter cases and separators to a consistent format.
* .... Default is to snake_case, but other cases like camelCase are available
* Handles special characters and spaces, including transliterating characters like œ to oe.
* Appends numbers to duplicated names
* Converts "%" to "percent" and "#" to "number" to retain meaning
* Spacing (or lack thereof) around numbers is preserved
```{r}
# Create a data.frame with dirty names
test_df <- as.data.frame(matrix(ncol = 6))
names(test_df) <- c("firstName", "ábc@!*", "% successful (2009)",
"REPEAT VALUE", "REPEAT VALUE", "")
```
Clean the variable names, returning a data.frame:
```{r}
test_df %>%
clean_names()
```
Compare to what base R produces:
```{r}
make.names(names(test_df))
```
------------------------------------------------
## ``remove_empty`` Remove empty rows and/or columns from a data.frame.
#### Description
Removes all rows and/or columns from a data.frame that are composed entirely of NA values.
#### Usage
<pre><code>
remove_empty(dat, which = c("rows", "cols"))
</code></pre>
```{r echo = FALSE}
# janitor
# https://search.r-project.org/CRAN/refmans/RcmdrMisc/html/colPercents.html
# https://cran.r-project.org/web/packages/janitor/vignettes/tabyls.html
# https://cran.r-project.org/web/packages/splitstackshape/splitstackshape.pdf
```