xafty

The package is designed to help with creating automation for tasks that are based on a spreadsheet document (like Excel). After reading in data from a spreadsheet, values and columns usually do not align with the assumptions of the code, especially if the document is frequently edited by hand.

Xafty offers a lightweight solution to check whether the spreadsheet data adheres to the specified rules stored in a validation table (referred to as “validity table”). The package aims to achieve the following goals:

Facilitate communication between the developer and user by agreeing on rules detailing how the data should be structured.
Enable users to repair data on their own, streamlining the maintenance of the automation solution.
Leverage the business expertise of users to set sensible rules, allowing developers to make plausible assumptions about the data.

Add Columns with Regular Expressions

New Update: xafty now supports the rule: ##!!regexcolumns which matches columns in the check table via a regular expression and applies the rules of that column to every matched column.

## Lets imagine, we have different car designs as columns in our check table. It would be tedious to add 
## them all by hand and with time the car design team may decide to add or delete design columns from the table. 
## Given that car design names follow some logic that can be matched with a regular expression, 
## the rule ##!!regexcolumns comes in very handy to apply the same rules to all matched columns:


check_table <- data.frame(T501 = c(0, 1, 0),
                          T4051 = c(1, 1, 1),
                          T301 = c(0.1, 0.9, 0),
                          LKW = c(1, 0, 0))
validity_table <- data.frame(LKW = c("##!!number", NA , NA),
                  Car_Desing = c("##!!number", "##!!regexcolumns", "^T[1-9]"))

new_validity_table <- add_regex_columns_to_validity(check_table = check_table, validity_table = validity_table)
                    
# |      LKW      |       T501        |        T4051      |       T301        |
# |---------------------------------------------------------------------------|
# |   ##!!number  |  ##!!number       |  ##!!number       | ##!!number        |
# |     <NA>      |  ##!!regexcolumns |  ##!!regexcolumns | ##!!regexcolumns  |
# |     <NA>      |  ^T[1-9]          |  ^T[1-9]          | ^T[1-9]           |

# The validity table can then be used with all check functions:
check_column_number(check_table = check_table, validity_table = new_validity_table, 
                    check_type = "equal", simply = TRUE)

Rule syntax: ##!!regexcolumns

Installation

You can install the development version of xafty like so:

remotes::install_github("davidcrone/xafty")

xafty Rules

The Check Table

The check table contains the data that will be checked. Usually, it is data read in from a spreadsheet.

In order to introduce all possible rules and their variants, we will use a reduced ‘mtcars’ data set as an example.

data(mtcars)
mtcars$name <- rownames(mtcars)
rownames(mtcars) <- NULL
mtcars <- mtcars[1:5, c("name", "cyl", "wt")]
mtcars$cyl <- factor(mtcars$cyl)

mtcars

The Validity Table

A validity table consists of column names and column rules that will then be compared to the corresponding columns in the check table.

To start of building the validity table we set the column names without any rules:

# NA values will always be ignored in a validity table
validity_table <- data.frame("name" = NA, "cyl" = NA, "wt" = NA)

Check Number of Columns

Checks if the number (count) of columns of the check table is equal to that of the validity table. Use this check if the number of columns is important for downstream code.

Rule syntax: The number of columns in the validity table

check_column_number(check_table = mtcars, validity_table = validity_table, check_type = "equal")

Check Column Names

Checks if the columns in the check table are all correctly named and present. There exists a stricter variant of the check by specifying the check type as “order” which additionally checks if the columns in the check table are in the correct order as specified in the validity table.

Rule syntax: The name and order of the columns in the validity table

check_column_names(check_table = mtcars, validity_table = validity_table, check_type = "presence")

Check Column Data Types

Checks whether all columns in the check table have the correct data type (class) associated with them. To perform the check the validity table must have data type rules under each column.

Rule syntax:

Rule	Data Type
##!!number	integer
##!!number	double
##!!text	character
##!!factor	factor
##!!date	Date
##!!datetime	POSIXct

validity_table <- data.frame("name" = "##!!text", "cyl" = "##!!factor", "wt" = "##!!number")
check_column_types(check_table = mtcars, validity_table = validity_table)

Check for Empty Values

Checks for empty (NA) values or values provided below the rule that stand for missing values. Use this check when a column should not have any empty or missing values. Consequently, NA values must be correctly identified before being able to use this check correctly.

Rule syntax: ##!!notempty

validity_table <- data.frame("name" = c("##!!text", "##!!notempty"), 
                  "cyl" = c("##!!factor", "##!!notempty"), "wt" = c("##!!number", "##!!notempty"))
check_column_notempty(check_table = mtcars, validity_table = validity_table)

## Alternatively, we can set the notempty-rule to look for NA values and values that stand for NA values. 
## Setting the validity table as shown below, will interprete both NA or 999 as missing values within 
## the specific columns of the check table.
validity_table <- data.frame("name" = c("##!!text", "##!!notempty", "999"), 
                  "cyl" = c("##!!factor", "##!!notempty", "999"), "wt" = c("##!!number", "##!!notempty", "999"))

Check for Exact Values

Checks whether the provided values in the validity table match exactly the data in the check table. To provide values to match for in the validity, we simply add the desired rule followed by the values the algorithm should look for.

Rule syntax:

Rule	Explanation
##!!strictexact	All values in the check table must match the provided values in the validity table
##!!anyexact	At least one value in the check table must match any of the provided values in the validity table
##!!eachexact	All provided values in the validity table must be matched at least once in the check table

validity_table <- data.frame("name" = c("##!!text", "##!!notempty", "##!!anyexact", "Mazda RX4"), 
                             "cyl" = c("##!!factor", "##!!notempty", NA, NA), 
                             "wt" = c("##!!number", "##!!notempty", NA, NA))
check_column_exactinput(check_table = mtcars, validity_table = validity_table)

Check for Pattern Values

Checks whether the provided patterns in a column of the validity table matches to the corresponding column in the check table. Several variant rules exist for the pattern check.

Rule syntax:

Rule	Explanation
##!!strictpattern	All values in the column must match to every pattern provided in the validity table
##!!rowpattern	All values of the column must match at least to one pattern in the validity table
##!!anypattern	At least one value in the column must match to any pattern provided in the validity table
##!!eachpattern	Every pattern provided in the validity table must match at least once to any value in the check table

validity_table <- data.frame("name" = c("##!!text", "##!!notempty", "##!!rowpattern", "Mazda", "Datsun", "Hornet"), 
                             "cyl" = c("##!!factor", "##!!notempty", NA, NA, NA, NA), 
                             "wt" = c("##!!number", "##!!notempty", NA, NA, NA, NA))
check_column_patterninput(check_table = mtcars, validity_table = validity_table)

Example

Basic workflow to check a table for its ‘validity’ before using it in downstream code:

library(xafty)

# Load example data
data(Indometh)
check_table <- Indometh
validity_table <- data.frame("Subject" = c("##!!factor", "##!!notempty", "##!!strictexact", 1:6),
                             "time" = c("##!!number", "##!!notempty", "##!!anypattern", "8", rep(NA, 5)),
                             "conc" = c("##!!number", "##!!notempty", rep(NA, 7)))

# Aligns the data type of the check table as specified data types in the validity table
check_table <- align_column_types(check_table = check_table, validity_table = validity_table)

# Check if the check table is valid
check_result <- check_validity(check_table = check_table, validity_table = validity_table)

if (all(check_result$Check_Result)) {

    print("Everythin is in order. 'Proceed with automation task.'")
    
   } else {
   
    print("Check table does not adhere to specifications. 'Proceed with repairing actions.'")
   
   }

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
R		R
inst/extdata		inst/extdata
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
xafty.Rproj		xafty.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

xafty

Add Columns with Regular Expressions

Installation

xafty Rules

The Check Table

The Validity Table

Check Number of Columns

Check Column Names

Check Column Data Types

Check for Empty Values

Check for Exact Values

Check for Pattern Values

Example

About

Licenses found

Releases

Packages

Languages

License

Licenses found

davidcrone/xafty

Folders and files

Latest commit

History

Repository files navigation

xafty

Add Columns with Regular Expressions

Installation

xafty Rules

The Check Table

The Validity Table

Check Number of Columns

Check Column Names

Check Column Data Types

Check for Empty Values

Check for Exact Values

Check for Pattern Values

Example

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages