Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assign_datetime algorithm #47

Merged
merged 84 commits into from
May 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
1c4501f
First mockup of `hardcode_no_ct()`
ramiromagno Feb 8, 2024
faef0b1
Update `hardcode_no_ct()`
ramiromagno Feb 17, 2024
fd63b37
Align `hardcode_no_ct()` code style with Ramm's expectations
ramiromagno Feb 21, 2024
80d3943
Add `hardcode_*()` and `assign_*()` functions
ramiromagno Feb 22, 2024
ec5a9e4
hardcode_no_ct algorithm code changes (#45)
rammprasad Mar 13, 2024
0333d95
Add `oak_id_vars()`
ramiromagno Mar 14, 2024
7fd7716
Fix typo in `recode()`
ramiromagno Mar 14, 2024
802aacc
Simplify `oak_id_vars()` docs
ramiromagno Mar 14, 2024
7fd07b4
Update `assign_*` and `hardcode_*` implementations
ramiromagno Mar 14, 2024
9cc26d1
Introduce memoisation of `ct_mappings()`
ramiromagno Mar 14, 2024
329feaa
Update of README introductory paragraph
ramiromagno Mar 14, 2024
29be830
Merge from main
ramiromagno Mar 14, 2024
7720c05
Update hardcode_* functions' interface
ramiromagno Mar 24, 2024
e87ca66
Add `contains_oak_id_vars()` function
ramiromagno Mar 24, 2024
a5e61f0
Update `contains_oak_id_vars()` doc examples
ramiromagno Mar 24, 2024
a60ccd6
Update `sdtm_harcode()` and dependant functions
ramiromagno Mar 24, 2024
cd89804
Update `assign_*` and `hardcore_*` related functions
ramiromagno Mar 25, 2024
ae2da80
Automatic renv profile update.
ramiromagno Mar 25, 2024
30857e3
Automatic renv profile update.
ramiromagno Mar 25, 2024
73ebe2d
Make `ct` and `cl` parameters mandatory for `assign_ct()`
ramiromagno Mar 27, 2024
0eb4677
Add functions ct importing
ramiromagno Mar 27, 2024
dfd7710
Bring `hardcode*()` and `assign*()` related assertions closer to user…
ramiromagno Mar 27, 2024
6652aae
Add lagging behind Rd for `ct_example()`
ramiromagno Mar 27, 2024
59bcc71
Add `assert_ct()`
ramiromagno Mar 27, 2024
7f9f388
Add ct assertions
ramiromagno Mar 27, 2024
4c81ae1
Merge branch '0040_hardcode_no_ct' of github.com:pharmaverse/sdtm.oak…
ramiromagno Mar 27, 2024
4ed5c41
Remove R/.gitkeep
ramiromagno Apr 1, 2024
ca26d22
Add unit tests for `ct_vars()`
ramiromagno Apr 1, 2024
0456d55
Update dependencies
ramiromagno Apr 1, 2024
0e1eab4
Export `ct_vars()`
ramiromagno Apr 1, 2024
84a4f7d
Update `assert_ct()` docs
ramiromagno Apr 1, 2024
7cf1072
Clarify `assign_ct()`/`assign_no_ct()` doc
ramiromagno Apr 1, 2024
7dff0aa
Improve grammar in doc
ramiromagno Apr 1, 2024
cb2f2e8
Remove last empty line from ct example file
ramiromagno Apr 1, 2024
454b7d8
Add documentation to `sdtm_assign()` and ct-related unit tests
ramiromagno Apr 1, 2024
fafe01b
Update hardcode-related fns
ramiromagno Apr 1, 2024
3a4b355
Changes to meet linter issues
ramiromagno Apr 1, 2024
37575b2
Code reformatting
ramiromagno Apr 1, 2024
c176654
Code reflow
ramiromagno Apr 1, 2024
dafcfef
Improve `assert_cl()` docs
ramiromagno Apr 1, 2024
e128779
Update `read_ct()` docs
ramiromagno Apr 1, 2024
0895764
Automatic renv profile update.
ramiromagno Apr 1, 2024
339039e
Automatic renv profile update.
ramiromagno Apr 1, 2024
ab9db14
Add units tests for `recode()`
ramiromagno Apr 1, 2024
52c52fa
Remove `are_to_recode()` function
ramiromagno Apr 1, 2024
229c0bd
Add units tests for `assert_ct()`
ramiromagno Apr 1, 2024
c83bfdf
Add one more test for `assert_ct()`
ramiromagno Apr 1, 2024
a362578
Add a basic unit test for `ct_mappings()`
ramiromagno Apr 1, 2024
934a15c
Fill in some doc details of ct-related functions
ramiromagno Apr 2, 2024
0dcf0fc
Remove leftover doc text in `assign`
ramiromagno Apr 2, 2024
a44c865
Update website's reference
ramiromagno Apr 2, 2024
efb423f
Styling update
ramiromagno Apr 2, 2024
365fa09
Bump version and update NEWS
ramiromagno Apr 2, 2024
b267610
Fix a few lintr issues
ramiromagno Apr 2, 2024
cbd38eb
Merge branch '0040_hardcode_no_ct' of github.com:pharmaverse/sdtm.oak…
ramiromagno Apr 2, 2024
9cb23f5
Add examples to `ct_map()` doc
ramiromagno Apr 2, 2024
1bebdd8
Fix typo in `problems()` doc
ramiromagno Apr 2, 2024
a8f1bf5
Fix typo
ramiromagno Apr 2, 2024
92e490c
Initial mockup of `assign_datetime()`
ramiromagno Apr 2, 2024
d9031fd
Add `.warn` parameter to `create_iso8601()` internals
ramiromagno Apr 3, 2024
5987684
Remove lint issues
ramiromagno Apr 3, 2024
2791ef0
Replace `.data` usage in tidyselect expressions
ramiromagno Apr 3, 2024
2a8dbf5
Variable renaming
ramiromagno Apr 4, 2024
a718207
Finish pending renaming of variables
ramiromagno Apr 4, 2024
8cc8dcb
Rename code-list to codelist
ramiromagno Apr 4, 2024
609b60e
Fix style
ramiromagno Apr 4, 2024
e8beefc
Fix style
ramiromagno Apr 4, 2024
e4e7c99
Merge branch '0040_hardcode_no_ct' into 0046_assign_datetime
ramiromagno Apr 4, 2024
b854a10
Add assertions to `assign_datetime()`
ramiromagno Apr 4, 2024
c737310
Add merge example to `assign_datetime()` doc
ramiromagno Apr 4, 2024
bbbadd3
Style changes
ramiromagno Apr 4, 2024
54a6460
Style changes (.Rd)
ramiromagno Apr 4, 2024
79e79da
Bump version and update news
ramiromagno Apr 4, 2024
42d4d5a
Update `ct_map()` doc example
ramiromagno Apr 10, 2024
66644eb
Make tibbles more readable in doc examples
ramiromagno Apr 10, 2024
bb2e0d2
Rename `ct_cltc` to `ct_clst`
ramiromagno Apr 10, 2024
97439f6
Merge branch '0040_hardcode_no_ct' into 0046_assign_datetime
ramiromagno Apr 10, 2024
73fe395
Make tibble more readable in `assign_datetime()` doc examples
ramiromagno Apr 10, 2024
b17161c
Fix bug in `assign_datetime`
ramiromagno May 4, 2024
3dba6d9
Linting
ramiromagno May 4, 2024
04254ac
Update styling
ramiromagno May 4, 2024
391df0a
Add example with date and time to `assign_datetime()` docs
ramiromagno May 14, 2024
372f147
Avoid backslash hell (մերսի)
ramiromagno May 14, 2024
6801222
Update `ct_spec_vars()` docs' examples
ramiromagno May 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: sdtm.oak
Type: Package
Title: SDTM Data Transformation Engine
Version: 0.0.0.9002
Version: 0.0.0.9003
Authors@R: c(
person("Rammprasad", "Ganapathy", role = c("aut", "cre"),
email = "[email protected]"),
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

S3method(print,iso8601)
export(assign_ct)
export(assign_datetime)
export(assign_no_ct)
export(clear_cache)
export(create_iso8601)
Expand Down
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# sdtm.oak 0.0.0.9003 (development version)

## New Features

* New function: `assign_datetime()` for deriving an ISO8601 date-time variable.

# sdtm.oak 0.0.0.9002 (development version)

## New Features
Expand Down
196 changes: 196 additions & 0 deletions R/assign_datetime.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
#' Derive an ISO8601 date-time variable
#'
#' [assign_datetime()] maps one or more variables with date/time components in a
#' raw dataset to a target SDTM variable following the ISO8601 format.
#'
#' @param raw_dat The raw dataset (dataframe); must include the
#' variables passed in `id_vars` and `raw_var`.
#' @param raw_var The raw variable(s): a character vector indicating the name(s)
#' of the raw variable(s) in `raw_dat` with date or time components to be
#' parsed into a ISO8601 format variable in `tgt_var`.
#' @param raw_fmt A date/time parsing format. Either a character vector or a
#' list of character vectors. If a character vector is passed then each
#' element is taken as parsing format for each variable indicated in
#' `raw_var`. If a list is provided, then each element must be a character
#' vector of formats. The first vector of formats is used for parsing the
#' first variable in `raw_var`, and so on.
#' @param tgt_var The target SDTM variable: a single string indicating the name
#' of variable to be derived.
#' @param raw_unk A character vector of string literals to be regarded as
#' missing values during parsing.
#' @param tgt_dat Target dataset: a data frame to be merged against `raw_dat` by
#' the variables indicated in `id_vars`. This parameter is optional, see
#' section Value for how the output changes depending on this argument value.
#' @param id_vars Key variables to be used in the join between the raw dataset
#' (`raw_dat`) and the target data set (`raw_dat`).
#' @param .warn Whether to warn about parsing failures.
#'
#' @returns The returned data set depends on the value of `tgt_dat`:
#' - If no target dataset is supplied, meaning that `tgt_dat` defaults to
#' `NULL`, then the returned data set is `raw_dat`, selected for the variables
#' indicated in `id_vars`, and a new extra column: the derived variable, as
#' indicated in `tgt_var`.
#' - If the target dataset is provided, then it is merged with the raw data set
#' `raw_dat` by the variables indicated in `id_vars`, with a new column: the
#' derived variable, as indicated in `tgt_var`.
#'
#' @examples
#' # `md1`: an example raw data set.
#' md1 <-
#' tibble::tribble(
#' ~oak_id, ~raw_source, ~patient_number, ~MDBDR, ~MDEDR, ~MDETM,
#' 1L, "MD1", 375, NA, NA, NA,
#' 2L, "MD1", 375, "15-Sep-20", NA, NA,
#' 3L, "MD1", 376, "17-Feb-21", "17-Feb-21", NA,
#' 4L, "MD1", 377, "4-Oct-20", NA, NA,
#' 5L, "MD1", 377, "20-Jan-20", "20-Jan-20", "10:00:00",
#' 6L, "MD1", 377, "UN-UNK-2019", "UN-UNK-2019", NA,
#' 7L, "MD1", 377, "20-UNK-2019", "20-UNK-2019", NA,
#' 8L, "MD1", 378, "UN-UNK-2020", "UN-UNK-2020", NA,
#' 9L, "MD1", 378, "26-Jan-20", "26-Jan-20", "07:00:00",
#' 10L, "MD1", 378, "28-Jan-20", "1-Feb-20", NA,
#' 11L, "MD1", 378, "12-Feb-20", "18-Feb-20", NA,
#' 12L, "MD1", 379, "10-UNK-2020", "20-UNK-2020", NA,
#' 13L, "MD1", 379, NA, NA, NA,
#' 14L, "MD1", 379, NA, "17-Feb-20", NA
#' )
#'
#' # Using the raw data set `md1`, derive the variable CMSTDTC from MDBDR using
#' # the parsing format (`raw_fmt`) `"d-m-y"` (day-month-year), while allowing
#' # for the presence of special date component values (e.g. `"UN"` or `"UNK"`),
#' # indicating that these values are missing/unknown (unk).
#' cm1 <-
#' assign_datetime(
#' raw_dat = md1,
#' raw_var = "MDBDR",
#' raw_fmt = "d-m-y",
#' raw_unk = c("UN", "UNK"),
#' tgt_var = "CMSTDTC"
#' )
#'
#' cm1
#'
#' # Inspect parsing failures associated with derivation of CMSTDTC.
#' problems(cm1$CMSTDTC)
#'
ramiromagno marked this conversation as resolved.
Show resolved Hide resolved
#' # `cm_inter`: an example target data set.
#' cm_inter <-
#' tibble::tibble(
#' oak_id = 1L:14L,
#' raw_source = "MD1",
#' patient_number = c(
#' 375, 375, 376, 377, 377, 377, 377, 378,
#' 378, 378, 378, 379, 379, 379
#' ),
#' CMTRT = c(
#' "BABY ASPIRIN",
#' "CORTISPORIN",
#' "ASPIRIN",
#' "DIPHENHYDRAMINE HCL",
#' "PARCETEMOL",
#' "VOMIKIND",
#' "ZENFLOX OZ",
#' "AMITRYPTYLINE",
#' "BENADRYL",
#' "DIPHENHYDRAMINE HYDROCHLORIDE",
#' "TETRACYCLINE",
#' "BENADRYL",
#' "SOMINEX",
#' "ZQUILL"
#' ),
#' CMINDC = c(
#' "NA",
#' "NAUSEA",
#' "ANEMIA",
#' "NAUSEA",
#' "PYREXIA",
#' "VOMITINGS",
#' "DIARHHEA",
#' "COLD",
#' "FEVER",
#' "LEG PAIN",
#' "FEVER",
#' "COLD",
#' "COLD",
#' "PAIN"
#' )
#' )
#'
#' # Same derivation as above but now involving the merging with the target
#' # data set `cm_inter`.
#' cm2 <-
#' assign_datetime(
#' raw_dat = md1,
#' raw_var = "MDBDR",
#' raw_fmt = "d-m-y",
#' tgt_var = "CMSTDTC",
#' tgt_dat = cm_inter
#' )
#'
#' cm2
#'
#' # Inspect parsing failures associated with derivation of CMSTDTC.
#' problems(cm2$CMSTDTC)
#'
#' # Derive CMSTDTC using both MDEDR and MDETM variables.
#' # Note that the format `"d-m-y"` is used for parsing MDEDR and `"H:M:S"` for
#' # MDETM (correspondence is by positional matching).
#' cm3 <-
#' assign_datetime(
#' raw_dat = md1,
#' raw_var = c("MDEDR", "MDETM"),
#' raw_fmt = c("d-m-y", "H:M:S"),
#' raw_unk = c("UN", "UNK"),
#' tgt_var = "CMSTDTC"
#' )
#'
#' cm3
#'
#' # Inspect parsing failures associated with derivation of CMSTDTC.
#' problems(cm3$CMSTDTC)
#'
#' @export
assign_datetime <-
function(raw_dat,
raw_var,
raw_fmt,
tgt_var,
raw_unk = c("UN", "UNK"),
tgt_dat = NULL,
id_vars = oak_id_vars(),
.warn = TRUE) {
admiraldev::assert_character_vector(raw_var)
admiraldev::assert_character_scalar(tgt_var)
admiraldev::assert_character_vector(id_vars)
assertthat::assert_that(contains_oak_id_vars(id_vars),
msg = "`id_vars` must include the oak id vars."
)
admiraldev::assert_data_frame(raw_dat, required_vars = rlang::syms(c(id_vars, raw_var)))
admiraldev::assert_data_frame(tgt_dat, required_vars = rlang::syms(id_vars), optional = TRUE)
admiraldev::assert_character_vector(raw_unk)
admiraldev::assert_logical_scalar(.warn)

tgt_val <-
create_iso8601(!!!raw_dat[raw_var],
.format = raw_fmt,
.na = raw_unk,
.warn = .warn
)

der_dat <-
raw_dat |>
dplyr::select(c(id_vars, raw_var)) |>
dplyr::mutate("{tgt_var}" := tgt_val) |> # nolint object_name_linter()
dplyr::select(-raw_var)

der_dat <-
if (!is.null(tgt_dat)) {
der_dat |>
dplyr::right_join(y = tgt_dat, by = id_vars) |>
dplyr::relocate(tgt_var, .after = dplyr::last_col())
} else {
der_dat
}

der_dat
}
10 changes: 5 additions & 5 deletions R/ct.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,17 @@
#' @examples
#' # These two calls are equivalent and return all required variables in a
#' # controlled terminology data set.
#' sdtm.oak:::ct_spec_vars()
#' sdtm.oak:::ct_spec_vars("all")
#' ct_spec_vars()
#' ct_spec_vars("all")
#'
#' # "Codelist code" variable name.
#' sdtm.oak:::ct_spec_vars("ct_clst")
#' ct_spec_vars("ct_clst")
#'
#' # "From" variables
#' sdtm.oak:::ct_spec_vars("from")
#' ct_spec_vars("from")
#'
#' # The "to" variable.
#' sdtm.oak:::ct_spec_vars("to")
#' ct_spec_vars("to")
#'
#' @keywords internal
#' @export
Expand Down
1 change: 1 addition & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ reference:
- assign
- harcode
- derive_study_day
- assign_datetime

- title: Controlled terminology
contents:
Expand Down
Loading
Loading