DTrsiv is a R package containing a collection of R data.table functions available to quickly and easily clean your data.
Everyone who wants is welcome to contribute!
Author: PAGEAUD Y.1
Contributors: Everyone who wants is welcome to contribute!
1- DKFZ - Division of Applied Bioinformatics, Germany.
Install devtools and data.table packages:
install.packages(pkgs = c("devtools", "data.table"))
devtools::install_github("YoannPa/DTrsiv")
dt_fun.R
script contains functions related to R data.table formating:
dt.sub()
for pattern matching and substitution applied on data.table object column-wise. It first identifies the columns containing any occurence matching the pattern and then applies the substitution considering only columns where the pattern matched, thus shortening execution time on data.table with many columns. It supports columns of type list.dt.ls2c()
converts data.table columns of type list to a type vector.dt.rm.dup()
removes duplicated columns based on their content (not on their names).dt.rm.allNA()
removes columns exclusively containing NAs from a data.table.dt.int64tochar()
converts columns of 'double.integer64' type into 'character' type.dt.combine()
combines values of partially duplicated columns from a data.table into new columns.
dt_chk.R
script contains functions related to checking a R data.table content:
allNA.col()
checks if any column contains exclusively NAs and returns their names if any with a warning.best.merged.dt()
looks for the best merging operation(s) between two data.tables trying a set of columns from the second one.
For any questions Not related to bugs or development you can write me at [email protected].
If you encounters issues or a feature you would expect is not part of DTrsiv functions available, please go to the DTrsiv Github repository click on the tab Issues and create an issue.
- Introduction to data.table: https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html
- Official R data.table Github repository: https://github.com/Rdatatable/data.table
- By-Group Processing, the R data.table and the Power of Open Source (22.02.2011) - Steve Miller