Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add function document_missing_values #118

Merged
merged 11 commits into from
Jul 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,6 @@ Imports:
stringr,
base,
readr,
lifecycle,
huxtable,
crayon,
data.table,
Expand All @@ -61,7 +60,8 @@ Imports:
sp,
withr,
cli,
purrr
purrr,
lifecycle
RoxygenNote: 7.3.1
Suggests:
knitr,
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ export(convert_datetime_format)
export(convert_long_to_utm)
export(convert_utm_to_ll)
export(create_datastore_script)
export(document_missing_values)
export(fix_utc_offset)
export(fuzz_location)
export(generate_ll_from_utm)
Expand Down
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# QCkit v0.1.8 (not yet released)

2024-07-16
* Added experimental function `document_missing_values()`, which searches a file for multiple missing value codes, replaces them all with NA, and generates a new column with the missing value codes so that they can be properly documented in EML. This is a work-around for the fact that there is currently not a good way to get multiple missing value codes in a single column via EMLassemblyline. This function is still under development; expect substantial changes an improvements up to and including removing the function entirely.

2024-07-09
* Added function `get_user_email()`, which accesses NPS active directory via a powershell function to return the user's email address. Probably won't work for non-NPS users and probably won't work for non-windows users.
* Updated rest API from legacy v6 to current v7.
Expand Down
83 changes: 83 additions & 0 deletions R/replace_blanks.R
Original file line number Diff line number Diff line change
Expand Up @@ -93,3 +93,86 @@ replace_blanks <- function(directory = here::here(), missing_val_code = NA) {
}
return(invisible())
}


#' Handles multiple missing values
#'
#' @description
#' `r lifecycle::badge("experimental")`
#' `r lifecycle::badge("questioning")`
#' Given a file name (.csv only) and path, the function will search the
#' columns for any that contain multiple user-specified missing value codes.
#' For any column with multiple missing value codes, all the missing values
#' (including blanks) will be replaced with NA. A new column will be generated
#' and, populated with the given missing value code from the origin column.
#' Values that were not missing will be populated with "not_missing". The
#' newly generate column of categorical variables can be used do describe
#' the various/multiple reasons for why data is absent in the original column.
#'
#' The function will then write the new dataframe to a file, overwriting the
#' original file. If it is important to keep a copy of the original file, make
#' a copy prior to running the function.
#'
#' WARNING: this function will replace any blank cells in your data with NA!
#'
#' @details Blank cells will be treated as NA.
#'
#' @param file_name String. The name of the file to inspect
#' @param directory String. Location of file to read/write. Defaults to the current working directory.
#' @param colname `r lifecycle::badge("experimental")` String. The columns to inspect. CURRENTLY ONLY WORKS AS SET TO DEFAULT "NA".
#' @param missing_val_codes List. A list of strings containing the missing value code or codes to search for.
#' @param replace_value String. The value (singular) to replace multiple missing values with. Defaults to NA.
#'
#' @return writes a new dataframe to file. Return invisible.
#' @export
#'
#' @examples
#' \dontrun{
#' document_missing_values(file_name = "mydata.csv",
#' directory = here::here(),
#' colname = NA, #do not change during function development
#' missing_val_codes = c("missing", "blank", "no data"),
#' replace_value = NA)
#' }
document_missing_values <- function(file_name,
directory = here::here(),
colname = NA,
missing_val_codes = NA,
replace_value = NA) {

#read in a dataframe:
df <- readr::read_csv(paste0(directory, "/", file_name),
show_col_types = FALSE)
#generate list of missing values
missing_val_codes <- append(missing_val_codes, NA)
missing_val_codes <- unique(missing_val_codes)

data_names <- colnames(df)

if (is.na(colname)) {
y <- ncol(df)
for (i in 1:y) {
#if here are multiple missing value codes in a column:
if (sum(df[[data_names[i]]] %in% missing_val_codes) >
sum(is.na(df[[data_names[i]]]))) {
#generate new column of data:
df$x <- with(df,
ifelse(df[[data_names[i]]] %in% missing_val_codes,
df[[data_names[i]]], "not_missing"))
#replace old missing values with replacement value
df[[data_names[i]]] = ifelse(df[[data_names[i]]] %in%
missing_val_codes,
replace_value, df[[data_names[i]]])
#rename new column:
names(df)[names(df) == "x"] <- paste0("custom_",
data_names[i],
"_MissingValues")
}
}
}
#write the file back out:
readr::write_csv(df, paste0(directory, "/", file_name))

return(invisible)

}
2 changes: 1 addition & 1 deletion docs/index.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions docs/news/index.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion docs/pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,5 @@ articles:
DRR_Purpose_and_Scope: DRR_Purpose_and_Scope.html
Starting-a-DRR: Starting-a-DRR.html
Using-the-DRR-Template: Using-the-DRR-Template.html
last_built: 2024-07-09T14:49Z
last_built: 2024-07-16T15:01Z

174 changes: 174 additions & 0 deletions docs/reference/document_missing_values.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading