Skip to content

Commit

Permalink
adding primary key option to read all excel
Browse files Browse the repository at this point in the history
  • Loading branch information
collinschwantes committed Jun 17, 2024
1 parent cff3dc4 commit 903d2fd
Show file tree
Hide file tree
Showing 4 changed files with 73 additions and 6 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: ohcleandat
Type: Package
Title: One Health Data Cleaning and Quality Checking Package
Version: 0.2.3
Version: 0.2.4
Authors@R: c(
person("Collin", "Schwantes", email = "[email protected]", role = c("cre", "aut"), comment = c(ORCID = "0000-0003-4014-4896")),
person("Johana", "Teigen", email = "[email protected]", role = "aut", comment = c(ORCID = "0000-0002-6209-2321")),
Expand Down
55 changes: 52 additions & 3 deletions R/read_excel_all_sheets.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,60 @@
#' For a given excel file, this will detect all sheets, and iteratively read
#' all sheets and place them in a list.
#'
#' @param file character File path to an excel file
#' If primary keys are added, the primary key is the triplet of the file,
#' sheet name, and row number e.g. "file_xlsx_sheet1_1". Row numbering is based
#' on the data ingested into R. R automatically skips empty rows at the beginning
#' of the spreadsheet so id 1 in the primary key will belong to the first row
#' with data.
#'
#' @note The primary key method is possible because Excel forces sheet names
#' to be unique.
#'
#' @param add_primary_key_field Logical. Should a primary key field be added?
#' @param primary_key character. The column name for the unique identifier to be added to the data.
#' @param file character. File path to an excel file
#'
#' @return list
#' @export
read_excel_all_sheets <- function(file){
#'
#' @example
#' \dontrun{
#' # Adding primary key field
#' read_excel_all_sheet(file = "test_pk.xlsx",add_primary_key_field = TRUE)
#'
#' # Don't add primary key field
#' read_excel_all_sheet(file = "test_pk.xlsx")
#'
#' }
#'
read_excel_all_sheets <- function(file, add_primary_key_field = FALSE, primary_key = "primary_key"){
sheets <- readxl::excel_sheets(file)
purrr::map(sheets, ~readxl::read_excel(file, sheet = .x))

if(!add_primary_key_field){
out <- purrr::map(sheets, ~readxl::read_excel(file, sheet = .x))
return(out)
}

if(add_primary_key_field){
purrr::map2(sheets,file,function(sheet,file){
df <- readxl::read_excel(file, sheet = sheet)

file_name <- gsub("\\.","_",basename(file))
row_ids <- paste(file_name,sheet,1:nrow(df),sep = "_")

if(primary_key%in%names(df)){

msg <- sprintf("primary_key - %s - is already a column in the dataframe.
\nPlease choose a column name that isn't present in the data.",primary_key)
rlang::abort(msg)
}

out <- df %>%
dplyr::mutate({{primary_key}} := {{row_ids}})

return(out)
})
}

}

Binary file added inst/test_pk.xlsx
Binary file not shown.
22 changes: 20 additions & 2 deletions man/read_excel_all_sheets.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 903d2fd

Please sign in to comment.