janitor_R_package.Rmd

---
title: "The {janitor} R package"
subtitle: "Simple Tools for Examining and Cleaning Dirty Data"
author: "R Workshop"
output:
  prettydoc::html_pretty:
    theme: cayman
    highlight: github
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
require(prettydoc)
require(tidyverse)
require(janitor)
```

### Janitor

Janitor are all package contains many functions that extend and enhance the functionality of ***tidy verse***. We will have a look at some of the commands in this section

### Installation

<pre><code>
install.packages("janitor")
library(janitor)
</code></pre>


#### Examples
```{r}
mtcars %>%
tabyl(am, cyl) %>%
adorn_percentages("col")
# calculates correctly even with totals column and/or row:
```
```{r}
mtcars %>%
tabyl(am, cyl) %>%
adorn_totals("row") %>%
adorn_percentages()
```


## Clean data.frame names with ``clean_names()``

Call this function every time you read data.

It works in a %>% pipeline, and handles problematic variable names, especially those that are so well-preserved 
by ``readxl::read_excel()`` and ``readr::read_csv()``.

*    Parses letter cases and separators to a consistent format.
*    .... Default is to snake_case, but other cases like camelCase are available
*    Handles special characters and spaces, including transliterating characters like œ to oe.
*    Appends numbers to duplicated names
*    Converts "%" to "percent" and "#" to "number" to retain meaning
*    Spacing (or lack thereof) around numbers is preserved

```{r}
# Create a data.frame with dirty names
test_df <- as.data.frame(matrix(ncol = 6))
names(test_df) <- c("firstName", "ábc@!*", "% successful (2009)",
                    "REPEAT VALUE", "REPEAT VALUE", "")
```
Clean the variable names, returning a data.frame:
```{r}
test_df %>%
  clean_names()
```
Compare to what base R produces:
```{r}
make.names(names(test_df))
```

------------------------------------------------


## ``remove_empty`` Remove empty rows and/or columns from a data.frame.

#### Description
Removes all rows and/or columns from a data.frame that are composed entirely of NA values.

#### Usage 

<pre><code>
remove_empty(dat, which = c("rows", "cols"))
</code></pre> 


```{r echo = FALSE}

# janitor

# https://search.r-project.org/CRAN/refmans/RcmdrMisc/html/colPercents.html
# https://cran.r-project.org/web/packages/janitor/vignettes/tabyls.html
# https://cran.r-project.org/web/packages/splitstackshape/splitstackshape.pdf
```