-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: raw_filter and tgt_filter parameters #54
Comments
As per discussion with @rammprasad I will implement instead a separate function to mark records in tibbles for filtering. The new function will be: |
Shall we name the function add_cond() or add_condition()? |
The example will look like below. Example If conditions Involving raw_datIf [AESOS.AESO] == 1 and [AESOS.AESOSP] is null then hardcode OE.OEORRES = 'Y' AESOS is the raw dataset and AESO, AESOSP are variables in the raw dataset. OE is the target domain and OEORRES is the target variable. hardcode_no_ct(
raw_dat = add_cond(AESOS, AESO == 1 && !is.na(AESOSP)),
raw_var = AESO,
tgt_var = OEORRES,
tgt_val = "Y",
tgt_dat = OE_INTER,
id_vars = oak_id_vars()
) If [AESOS.AESO] == 1 and [AESOS.AESOSP] is null then hardcode OE.OETESTCD = 'IOISYMPO' hardcode_ct(
raw_dat = add_cond(AESOS, AESO == 1 && is.null(AESOSP)),
raw_var = AETERM,
tgt_var = OETESTCD,
tgt_val = 'IOISYMPO',
ct_spec = study_ct,
ct_clst = "C123456",
id_vars = oak_id_vars()
) Involving tgt_datIf VS.VSTESTCD = 'TEMP', assign the value collected in VTLS1.TEMPLOC to VS.VSLOC. VTLS1 is the raw dataset name and TEMPLOC is a variable in the raw dataset. VS is the target domain and VSLOC is derived. #when using in-pipe
|>
assign_ct(
raw_dat = VTLS1,
raw_var = "TEMPLOC",
tgt_var = "VSLOC",
ct_spec = study_ct,
ct_clst = "C12123431",
tgt_dat = add_cond(.data, VSTESTCD == "TEMP"),
raw_filter = NULL,
id_vars = oak_id_vars()
) Involving raw_dat and tgt_dat but separate conditionsIf [AECOV19.SPECTYP] is not null, and FA.FATESTCD = 'STATUS' and FA.FAOBJ = 'Severe Acute Resp Syndrome Coronavirus 2' assign the value collected in SPCNM to then FA.FASPEC. In this example, AECOV19 is the raw dataset name, and SPECTYP is a variable in the raw dataset. The condition also involved the target domain FA. FAOBJ and FATESTCD are previously derived SDTM variables, and FASPEC is the SDTM variable that is currently derived. #when using in-pipe
|>
assign_ct(
raw_dat = add_cond(AECOV19, is.null(SPECTYP)),
raw_var = "SPCNM",
tgt_var = "FASPEC",
ct_spec = study_ct,
ct_clst = "C1212121",
tgt_dat = add_cond(.data, FATESTCD == "STATUS" && FAOBJ == "Severe Acute Resp Syndrome Coronavirus 2"),
id_vars = oak_id_vars()
) Involving raw_dat and tgt_dat in the same conditionWe may not be able to support this. Take a look and let me know @ramiromagno MH.MHLOC when MH.MHTERM = [GCAHX.NCITERM] or [GCAHX.NCITERMO] Map the collected value in GCAHX raw_dat locat raw_varialble to MH.MHLOC when this condition is met #when using in-pipe
|>
assign_ct(
raw_dat = GCAHX
raw_var = "SPCNM",
tgt_var = "FASPEC",
ct_spec = study_ct,
ct_clst = "C1212121",
tgt_dat = add_cond(.data, MHTERM %in% GCAHX$NCITERM || MHTERM %in% GCAHX$NCITERM O),
id_vars = oak_id_vars()
) |
To help understand that use case involving variables of |
What should happen if the condition results in |
The {roak} processes it very differently, and it is driven by metadata. The main branch has an example. Please refer to the example mapping CMMODIFTY with the annotation text A mock of automation of this in {roak} will look like # Derive qualifier CMMODIFY Annotation text = If different to CM.CMTRT then CM.CMMODIFY
if_then_else(
raw_dat = cm_raw,
raw_var = CMMODIFY,
condition_left_raw_dataset = cm_raw,
condition_left_raw_variable = CMMODIFY,
condition_operator = "diffferent_to",
condition_right_sdtm_variable_domain = CM,
condition_right_sdtm_variable = CMTRT,
sub_algorithm = assign_no_ct,
tgt_var = CMDOSETXT,
id_vars = oak_id_vars()
) |> Can we do something like this in {sdtm.oak} where filtering needs to happen based on a condition in raw_dat and tar_dat? # Derive qualifier CMMODIFY Annotation text If collected value in CMMODIFY in cm_raw is different to CM.CMTRT then
# assign the collected value to CMMODIFY in the CM domain (CM.CMMODIFY)
assign_no_ct(
raw_dat = cm_raw,
raw_var = "CMMODIFY",
add_cond = (cm_raw$CMMODIFY == .data$CMTRT),
tgt_var = "CMMODIFY",
id_vars = oak_id_vars()
) |
If no records match the criteria, we create the tgt_var as an empty column. |
Preferred option to handle complex if condition. #when using in-pipe |
- Joins by raw and target data sets are now aware of conditioned tibbles - Transformation functions, namely `assign_datetime()`, `hardcode*()` and `assign*` are also conditioned-tibble aware - Unit test coverage for most cases indicated at #54 I believe the essential components are here to support the if_then_else algorithm via conditioned tibbles. Now, further testing, assertions and documentation is needed.
* Basic support for "conditioned" data frames - Adds a new S3 class (cnd_df) for represented conditioned data frames, i.e. data frames that carry metadata about what records should be used for derivations - Adds support for basic pretty printing of cnd_df objects - Adds a user-facing function for creating such cnd_df objects: `condition_by` - Adds experimental "mutate"-version function for these conditioned data frames: `derive_by_condition()` * Basic support for conditioned data sets * Extensive support for conditioned tibbles - Joins by raw and target data sets are now aware of conditioned tibbles - Transformation functions, namely `assign_datetime()`, `hardcode*()` and `assign*` are also conditioned-tibble aware - Unit test coverage for most cases indicated at #54 I believe the essential components are here to support the if_then_else algorithm via conditioned tibbles. Now, further testing, assertions and documentation is needed. * Ramm's feedback integration - Move `tgt_dat` to the first position in the argument list for cleaner command pipes. - Rename `condition_by()` to `condition_add()`. - Export `oak_id_vars()` for direct user access. - Update tidyselections to align with the latest practices. * Update on conditioned data frames - Documentation - Examples - New article about cnd_df (WIP) * Styling fixes * Update linting and styling * Tidying up - No need for S3 methods to be exported - `condition_add()` now links to the appropriate article about conditioned data frames - Documentation tweaks - Version bump, NEWS update and pkgdown reference list update * Last tweaks - Add example for `condition_add()` - Re-export S3 methods for `cnd_df` - Update pkgdown reference list * Remove blank line * Tweaks to `%.>%` docs * Automatic renv profile update. --------- Authored-by: ramiromagno <[email protected]>
* New function to derive oak_id_vars. More work needed. * cm template updates. In progress. * almost completed cm template * Basic support for "conditioned" data frames - Adds a new S3 class (cnd_df) for represented conditioned data frames, i.e. data frames that carry metadata about what records should be used for derivations - Adds support for basic pretty printing of cnd_df objects - Adds a user-facing function for creating such cnd_df objects: `condition_by` - Adds experimental "mutate"-version function for these conditioned data frames: `derive_by_condition()` * Basic support for conditioned data sets * Extensive support for conditioned tibbles - Joins by raw and target data sets are now aware of conditioned tibbles - Transformation functions, namely `assign_datetime()`, `hardcode*()` and `assign*` are also conditioned-tibble aware - Unit test coverage for most cases indicated at #54 I believe the essential components are here to support the if_then_else algorithm via conditioned tibbles. Now, further testing, assertions and documentation is needed. * Ramm's feedback integration - Move `tgt_dat` to the first position in the argument list for cleaner command pipes. - Rename `condition_by()` to `condition_add()`. - Export `oak_id_vars()` for direct user access. - Update tidyselections to align with the latest practices. * A fix to derive study day * Algorithms Vignette update * cm template code update * A function to help display of dataset in Vignette * Template update * Raw data change * DM domain csv * Events domain article * update controlled terminology * Updated CM template * VS domain template and Vignette * CM domain Vignette update * Update on conditioned data frames - Documentation - Examples - New article about cnd_df (WIP) * Fix Vignette * Updates to code * Remove white spaces * remove white spaces * clean up * pipeline failures * Fix pipeline failures * Automatic renv profile update. * Automatic renv profile update. * Fix pipeline failures * Fix pipeline failure * Moving DT from suggests to imports * Update WORDLIST * fix spelling --------- Co-authored-by: Rammprasad Ganapathy <[email protected]> Co-authored-by: Ramiro Magno <[email protected]> Co-authored-by: rammprasad <[email protected]>
Feature Idea
Introduce functionality to filter the raw and target datasets while performing a mapping.
Example If conditions
If [AESOS.AESO] == 1 and [AESOS.AESOSP] is null then hardcode OE.OEORRES = 'Y'
AESOS is the raw dataset and AESO, AESOSP are variables in the raw dataset. OE is the target domain and OEORRES is the target variable.
If [AESOS.AESO] == 1 and [AESOS.AESOSP] is null then hardcode OE.OETESTCD = 'IOISYMPO'
If VS.VSTESTCD = 'TEMP', assign the value collected in VTLS1.TEMPLOC to VS.VSLOC.
VTLS1 is the raw dataset name and TEMPLOC is a variable in the raw dataset. VS is the target domain and VSLOC is derived.
Involving raw_dat and tgt_dat but separate conditions
If [AECOV19.SPECTYP] is not null, and FA.FATESTCD = 'STATUS' and FA.FAOBJ = 'Severe Acute Resp Syndrome Coronavirus 2' assign the value collected in SPCNM to then FA.FASPEC.
In this example AECOV19 is the raw dataset name, SPECTYP is a variable in the raw dataset. THe condition also involved the target domain FA, FAOBJ nad FATESTCD are previously derived SDTM variables and FASPEC is the SDTM variable that is currently derived.
Involving raw_dat and tgt_dat in the same condition
We may not be able to support this.
MH.MHLOC when MH.MHTERM = [GCAHX.NCITERM] or [GCAHX.NCITERMO]
Relevant Input
No response
Relevant Output
No response
Reproducible Example/Pseudo Code
No response
The text was updated successfully, but these errors were encountered: