-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
if_then_else support via "conditioned" data frames #55
Conversation
- Adds a new S3 class (cnd_df) for represented conditioned data frames, i.e. data frames that carry metadata about what records should be used for derivations - Adds support for basic pretty printing of cnd_df objects - Adds a user-facing function for creating such cnd_df objects: `condition_by` - Adds experimental "mutate"-version function for these conditioned data frames: `derive_by_condition()`
Still work in progress. |
- Joins by raw and target data sets are now aware of conditioned tibbles - Transformation functions, namely `assign_datetime()`, `hardcode*()` and `assign*` are also conditioned-tibble aware - Unit test coverage for most cases indicated at #54 I believe the essential components are here to support the if_then_else algorithm via conditioned tibbles. Now, further testing, assertions and documentation is needed.
A couple of items
Here is the sample code. study_ct <- read.csv(system.file("cm_domain/cm_sdtm_oak_ct.csv",
package = "sdtm.oak"))
# Read in raw data
cm_raw <- read.csv(system.file("cm_domain/cm_raw_data.csv",
package = "sdtm.oak")) %>%
#derive oak_id as the row number
dplyr::mutate(oak_id = structure(seq_len(nrow(.))),
patient_number = PATNUM,
raw_source = "ConMed") %>%
dplyr::select(oak_id_vars(), dplyr::everything())
# Create CM domain. The first step in creating CM domain is to create the topic variable
cm <-
# Derive topic variable
assign_no_ct(
raw_dat = cm_raw,
raw_var = "MDRAW",
tgt_var = "CMTRT"
) |>
# Derive CMGRPID when CMTRT == "BABY ASPIRIN"
assign_no_ct(
raw_dat = cm_raw,
raw_var = "MDNUM",
tgt_var = "CMGRPID",
#use condition_by function and fiter when CMTRT is "BABY ASPIRIN"
### This condition is not working. It results in an error
tgt_dat = condition_by(.,CMTRT == "BABY ASPIRIN"),
id_vars = oak_id_vars()
)
**Error message**
Error in assign_no_ct(assign_no_ct(raw_dat = cm_raw, raw_var = "MDRAW", :
unused argument (assign_no_ct(raw_dat = cm_raw, raw_var = "MDRAW", tgt_var = "CMTRT"))
Can you give give an example on how to program this mapping?
# Derive qualifier CMMODIFY - If collected value in CMMODIFY
# in cm_raw is different to CM domain CMTRT target variable then
# assign the collected value to CMMODIFY in CM domain (CM.CMMODIFY)
|
Yeah, there are a couple of things at play here:
So best approach might be that if we want to condition on the target data set, the one that is being passed along, we should perhaps move the Here is a set of examples that hopefully illustrate the different variations: library(sdtm.oak)
library(magrittr)
library(dplyr, warn.conflicts = FALSE)
study_ct <- read.csv(system.file("cm_domain/cm_sdtm_oak_ct.csv", package = "sdtm.oak"))
cm_raw <- read.csv(system.file("cm_domain/cm_raw_data.csv", package = "sdtm.oak")) %>%
#derive oak_id as the row number
dplyr::mutate(
oak_id = structure(seq_len(nrow(.))),
patient_number = PATNUM,
raw_source = "ConMed"
) %>%
dplyr::select(sdtm.oak:::oak_id_vars(), dplyr::everything())
# Derive topic variable
assign_no_ct(raw_dat = cm_raw,
raw_var = "MDRAW",
tgt_var = "CMTRT") %>%
# Derive CMGRPID when CMTRT == "BABY ASPIRIN"
assign_no_ct(
raw_dat = cm_raw,
raw_var = "MDNUM",
tgt_var = "CMGRPID",
tgt_dat = condition_by(., CMTRT == "BABY ASPIRIN")
)
#> Error in `admiraldev::assert_character_vector()` at sdtm.oak/R/assign.R:189:3:
#> ! `id_vars` must be a character vector but is a data frame
# Derive topic variable
assign_no_ct(raw_dat = cm_raw,
raw_var = "MDRAW",
tgt_var = "CMTRT") |>
condition_by(CMTRT == "BABY ASPIRIN") |>
assign_no_ct(
raw_dat = cm_raw,
raw_var = "MDNUM",
tgt_var = "CMGRPID",
tgt_dat = _
)
#> # A tibble: 14 × 5
#> oak_id raw_source patient_number CMTRT CMGRPID
#> <int> <chr> <int> <chr> <int>
#> 1 1 ConMed 375 BABY ASPIRIN 1
#> 2 2 ConMed 375 CORTISPORIN NA
#> 3 3 ConMed 376 ASPIRIN NA
#> 4 4 ConMed 377 DIPHENHYDRAMINE HCL NA
#> 5 5 ConMed 377 PARCETEMOL NA
#> 6 6 ConMed 377 VOMIKIND NA
#> 7 7 ConMed 377 ZENFLOX OZ NA
#> 8 8 ConMed 378 AMITRYPTYLINE NA
#> 9 9 ConMed 378 BENADRYL NA
#> 10 10 ConMed 378 DIPHENHYDRAMINE HYDROCHLORIDE NA
#> 11 11 ConMed 378 TETRACYCLINE NA
#> 12 12 ConMed 379 BENADRYL NA
#> 13 13 ConMed 379 SOMINEX NA
#> 14 14 ConMed 379 ZQUILL NA
# Derive topic variable
assign_no_ct(raw_dat = cm_raw,
raw_var = "MDRAW",
tgt_var = "CMTRT") |>
condition_by(CMTRT == "BABY ASPIRIN") %>%
assign_no_ct(
raw_dat = cm_raw,
raw_var = "MDNUM",
tgt_var = "CMGRPID",
tgt_dat = .
)
#> # A tibble: 14 × 5
#> oak_id raw_source patient_number CMTRT CMGRPID
#> <int> <chr> <int> <chr> <int>
#> 1 1 ConMed 375 BABY ASPIRIN 1
#> 2 2 ConMed 375 CORTISPORIN NA
#> 3 3 ConMed 376 ASPIRIN NA
#> 4 4 ConMed 377 DIPHENHYDRAMINE HCL NA
#> 5 5 ConMed 377 PARCETEMOL NA
#> 6 6 ConMed 377 VOMIKIND NA
#> 7 7 ConMed 377 ZENFLOX OZ NA
#> 8 8 ConMed 378 AMITRYPTYLINE NA
#> 9 9 ConMed 378 BENADRYL NA
#> 10 10 ConMed 378 DIPHENHYDRAMINE HYDROCHLORIDE NA
#> 11 11 ConMed 378 TETRACYCLINE NA
#> 12 12 ConMed 379 BENADRYL NA
#> 13 13 ConMed 379 SOMINEX NA
#> 14 14 ConMed 379 ZQUILL NA
# Derive topic variable
assign_no_ct(raw_dat = cm_raw,
raw_var = "MDRAW",
tgt_var = "CMTRT") %>%
{
assign_no_ct(
raw_dat = cm_raw,
raw_var = "MDNUM",
tgt_var = "CMGRPID",
tgt_dat = condition_by(dat = ., CMTRT == "BABY ASPIRIN")
)
}
#> # A tibble: 14 × 5
#> oak_id raw_source patient_number CMTRT CMGRPID
#> <int> <chr> <int> <chr> <int>
#> 1 1 ConMed 375 BABY ASPIRIN 1
#> 2 2 ConMed 375 CORTISPORIN NA
#> 3 3 ConMed 376 ASPIRIN NA
#> 4 4 ConMed 377 DIPHENHYDRAMINE HCL NA
#> 5 5 ConMed 377 PARCETEMOL NA
#> 6 6 ConMed 377 VOMIKIND NA
#> 7 7 ConMed 377 ZENFLOX OZ NA
#> 8 8 ConMed 378 AMITRYPTYLINE NA
#> 9 9 ConMed 378 BENADRYL NA
#> 10 10 ConMed 378 DIPHENHYDRAMINE HYDROCHLORIDE NA
#> 11 11 ConMed 378 TETRACYCLINE NA
#> 12 12 ConMed 379 BENADRYL NA
#> 13 13 ConMed 379 SOMINEX NA
#> 14 14 ConMed 379 ZQUILL NA
# Derive topic variable
assign_no_ct(raw_dat = cm_raw,
raw_var = "MDRAW",
tgt_var = "CMTRT") %>%
{
assign_no_ct(
raw_dat = cm_raw,
raw_var = "MDNUM",
tgt_var = "CMGRPID",
tgt_dat = condition_by(., CMTRT == "BABY ASPIRIN")
)
}
#> # A tibble: 14 × 5
#> oak_id raw_source patient_number CMTRT CMGRPID
#> <int> <chr> <int> <chr> <int>
#> 1 1 ConMed 375 BABY ASPIRIN 1
#> 2 2 ConMed 375 CORTISPORIN NA
#> 3 3 ConMed 376 ASPIRIN NA
#> 4 4 ConMed 377 DIPHENHYDRAMINE HCL NA
#> 5 5 ConMed 377 PARCETEMOL NA
#> 6 6 ConMed 377 VOMIKIND NA
#> 7 7 ConMed 377 ZENFLOX OZ NA
#> 8 8 ConMed 378 AMITRYPTYLINE NA
#> 9 9 ConMed 378 BENADRYL NA
#> 10 10 ConMed 378 DIPHENHYDRAMINE HYDROCHLORIDE NA
#> 11 11 ConMed 378 TETRACYCLINE NA
#> 12 12 ConMed 379 BENADRYL NA
#> 13 13 ConMed 379 SOMINEX NA
#> 14 14 ConMed 379 ZQUILL NA Question 2Regarding question 2 see: tests/testthat/test-assign.R towards the end. |
Thank you, @ramiromagno. I got it to work. Having it inline will make sense for the users. We will just put both options out there. cm <-
# Derive topic variable
assign_no_ct(
raw_dat = cm_raw,
raw_var = "MDRAW",
tgt_var = "CMTRT"
) %>%
# Derive CMGRPID
{
assign_no_ct(
raw_dat = cm_raw,
raw_var = "MDNUM",
tgt_var = "CMGRPID",
#use condition_by function and fiter when CMTRT is "BABY ASPIRIN"
tgt_dat = condition_by(dat =., CMTRT == "BABY ASPIRIN"),
id_vars = oak_id_vars()
)
} %>%
{
assign_no_ct(
raw_dat = cm_raw,
raw_var = "CMMODIFY",
tgt_var = "CMMODIFY",
tgt_dat = condition_by(. , CMMODIFY != CMTRT, .env = cm_raw),
id_vars = oak_id_vars()
)
} A couple of follow-up questions
{
assign_no_ct(
raw_dat = cm_raw,
raw_var = "CMMODIFY",
tgt_var = "CMMODIFY",
tgt_dat = condition_by( dat = . , CMMODIFY != CMTRT, dat2 = cm_raw),
id_vars = oak_id_vars()
)
}
#option 2
{
assign_no_ct(
raw_dat = condition_by(dat = cm_raw , CMMODIFY != CMTRT, dat2= .),
raw_var = "CMMODIFY",
tgt_var = "CMMODIFY",
tgt_dat = .,
id_vars = oak_id_vars()
)
} |
Hi @rammprasad:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greate Job, @ramiromagno . I have added my comments. We can additional test cases and examples.
Also, it will be good to change the way we compare two datasets, by adding an extra argument as suggested.
Thanks @rammprasad. I am not sure what was your take on relocating |
I thought we could circumvent the need for braces when using magrittr's pipe placeholder in nested calls if we moved So I think we are left only with three options:
In my opinion, the simplest and easiest for the user is option 1, i.e. moving library(sdtm.oak)
library(magrittr)
library(dplyr, warn.conflicts = FALSE)
study_ct <- read.csv(system.file("cm_domain/cm_sdtm_oak_ct.csv", package = "sdtm.oak"))
cm_raw <- read.csv(system.file("cm_domain/cm_raw_data.csv", package = "sdtm.oak")) %>%
#derive oak_id as the row number
dplyr::mutate(
oak_id = structure(seq_len(nrow(.))),
patient_number = PATNUM,
raw_source = "ConMed"
) %>%
dplyr::select(oak_id_vars(), dplyr::everything())
# DOES NOT WORK (native pipe)
# assign_no_ct(raw_dat = cm_raw,
# raw_var = "MDRAW",
# tgt_var = "CMTRT") |>
# assign_no_ct(
# tgt_dat = condition_add(_, CMTRT == "BABY ASPIRIN"),
# tgt_var = "CMGRPID",
# raw_dat = cm_raw,
# raw_var = "MDNUM"
# )
# DOES NOT WORK EITHER (magrittr's pipe)
assign_no_ct(raw_dat = cm_raw,
raw_var = "MDRAW",
tgt_var = "CMTRT") %>%
assign_no_ct(
tgt_dat = condition_add(., CMTRT == "BABY ASPIRIN"),
tgt_var = "CMGRPID",
raw_dat = cm_raw,
raw_var = "MDNUM"
)
#> Error in `admiraldev::assert_character_vector()` at sdtm.oak/R/assign.R:191:3:
#> ! `id_vars` must be a character vector but is a data frame
# WORKS (native pipe)
assign_no_ct(raw_dat = cm_raw,
raw_var = "MDRAW",
tgt_var = "CMTRT") |>
condition_add(CMTRT == "BABY ASPIRIN") |>
assign_no_ct(
raw_dat = cm_raw,
raw_var = "MDNUM",
tgt_var = "CMGRPID"
)
#> # A tibble: 14 × 5
#> oak_id raw_source patient_number CMTRT CMGRPID
#> <int> <chr> <int> <chr> <int>
#> 1 1 ConMed 375 BABY ASPIRIN 1
#> 2 2 ConMed 375 CORTISPORIN NA
#> 3 3 ConMed 376 ASPIRIN NA
#> 4 4 ConMed 377 DIPHENHYDRAMINE HCL NA
#> 5 5 ConMed 377 PARCETEMOL NA
#> 6 6 ConMed 377 VOMIKIND NA
#> 7 7 ConMed 377 ZENFLOX OZ NA
#> 8 8 ConMed 378 AMITRYPTYLINE NA
#> 9 9 ConMed 378 BENADRYL NA
#> 10 10 ConMed 378 DIPHENHYDRAMINE HYDROCHLORIDE NA
#> 11 11 ConMed 378 TETRACYCLINE NA
#> 12 12 ConMed 379 BENADRYL NA
#> 13 13 ConMed 379 SOMINEX NA
#> 14 14 ConMed 379 ZQUILL NA
# WORKS (maggritr's pipe)
assign_no_ct(raw_dat = cm_raw,
raw_var = "MDRAW",
tgt_var = "CMTRT") %>%
condition_add(CMTRT == "BABY ASPIRIN") |>
assign_no_ct(
raw_dat = cm_raw,
raw_var = "MDNUM",
tgt_var = "CMGRPID"
)
#> # A tibble: 14 × 5
#> oak_id raw_source patient_number CMTRT CMGRPID
#> <int> <chr> <int> <chr> <int>
#> 1 1 ConMed 375 BABY ASPIRIN 1
#> 2 2 ConMed 375 CORTISPORIN NA
#> 3 3 ConMed 376 ASPIRIN NA
#> 4 4 ConMed 377 DIPHENHYDRAMINE HCL NA
#> 5 5 ConMed 377 PARCETEMOL NA
#> 6 6 ConMed 377 VOMIKIND NA
#> 7 7 ConMed 377 ZENFLOX OZ NA
#> 8 8 ConMed 378 AMITRYPTYLINE NA
#> 9 9 ConMed 378 BENADRYL NA
#> 10 10 ConMed 378 DIPHENHYDRAMINE HYDROCHLORIDE NA
#> 11 11 ConMed 378 TETRACYCLINE NA
#> 12 12 ConMed 379 BENADRYL NA
#> 13 13 ConMed 379 SOMINEX NA
#> 14 14 ConMed 379 ZQUILL NA
# DOES NOT WORK (native pipe)
# assign_no_ct(raw_dat = cm_raw,
# raw_var = "MDRAW",
# tgt_var = "CMTRT") |>
# {
# assign_no_ct(
# tgt_dat = condition_add(_, CMTRT == "BABY ASPIRIN"),
# raw_dat = cm_raw,
# raw_var = "MDNUM",
# tgt_var = "CMGRPID"
# )
# }
# WORKS (maggritr's pipe)
assign_no_ct(raw_dat = cm_raw,
raw_var = "MDRAW",
tgt_var = "CMTRT") %>%
{
assign_no_ct(
tgt_dat = condition_add(., CMTRT == "BABY ASPIRIN"),
raw_dat = cm_raw,
raw_var = "MDNUM",
tgt_var = "CMGRPID"
)
}
#> # A tibble: 14 × 5
#> oak_id raw_source patient_number CMTRT CMGRPID
#> <int> <chr> <int> <chr> <int>
#> 1 1 ConMed 375 BABY ASPIRIN 1
#> 2 2 ConMed 375 CORTISPORIN NA
#> 3 3 ConMed 376 ASPIRIN NA
#> 4 4 ConMed 377 DIPHENHYDRAMINE HCL NA
#> 5 5 ConMed 377 PARCETEMOL NA
#> 6 6 ConMed 377 VOMIKIND NA
#> 7 7 ConMed 377 ZENFLOX OZ NA
#> 8 8 ConMed 378 AMITRYPTYLINE NA
#> 9 9 ConMed 378 BENADRYL NA
#> 10 10 ConMed 378 DIPHENHYDRAMINE HYDROCHLORIDE NA
#> 11 11 ConMed 378 TETRACYCLINE NA
#> 12 12 ConMed 379 BENADRYL NA
#> 13 13 ConMed 379 SOMINEX NA
#> 14 14 ConMed 379 ZQUILL NA |
- Move `tgt_dat` to the first position in the argument list for cleaner command pipes. - Rename `condition_by()` to `condition_add()`. - Export `oak_id_vars()` for direct user access. - Update tidyselections to align with the latest practices.
Thank you, @ramiromagno. Regarding the condition_add options, let's stick with options 1 and 2. We will add examples for both scenarios.
|
Let me know once it is ready for the final review. |
I know we decided to go with options 1 and 2. But given that Lionel sympathetically answered promptly, I am leaving it here for future record: library(sdtm.oak)
library(magrittr)
library(dplyr, warn.conflicts = FALSE)
`%>>%` <- function(data, expr) {
eval(substitute(expr), list(. = data), parent.frame())
}
study_ct <- read.csv(system.file("cm_domain/cm_sdtm_oak_ct.csv", package = "sdtm.oak"))
cm_raw <- read.csv(system.file("cm_domain/cm_raw_data.csv", package = "sdtm.oak")) %>%
#derive oak_id as the row number
dplyr::mutate(
oak_id = structure(seq_len(nrow(.))),
patient_number = PATNUM,
raw_source = "ConMed"
) %>%
dplyr::select(oak_id_vars(), dplyr::everything())
assign_no_ct(raw_dat = cm_raw,
raw_var = "MDRAW",
tgt_var = "CMTRT") %>>%
assign_no_ct(
tgt_dat = condition_add(., CMTRT == "BABY ASPIRIN"),
tgt_var = "CMGRPID",
raw_dat = cm_raw,
raw_var = "MDNUM"
)
#> # A tibble: 14 × 5
#> oak_id raw_source patient_number CMTRT CMGRPID
#> <int> <chr> <int> <chr> <int>
#> 1 1 ConMed 375 BABY ASPIRIN 1
#> 2 2 ConMed 375 CORTISPORIN NA
#> 3 3 ConMed 376 ASPIRIN NA
#> 4 4 ConMed 377 DIPHENHYDRAMINE HCL NA
#> 5 5 ConMed 377 PARCETEMOL NA
#> 6 6 ConMed 377 VOMIKIND NA
#> 7 7 ConMed 377 ZENFLOX OZ NA
#> 8 8 ConMed 378 AMITRYPTYLINE NA
#> 9 9 ConMed 378 BENADRYL NA
#> 10 10 ConMed 378 DIPHENHYDRAMINE HYDROCHLORIDE NA
#> 11 11 ConMed 378 TETRACYCLINE NA
#> 12 12 ConMed 379 BENADRYL NA
#> 13 13 ConMed 379 SOMINEX NA
#> 14 14 ConMed 379 ZQUILL NA Created on 2024-06-12 with reprex v2.1.0 |
Shall we add this as the third option? This looks sleeker than using {} |
- Documentation - Examples - New article about cnd_df (WIP)
@rammprasad and @edgar-manukyan: I think we are pretty close to having the code for conditioned data frames near completion.
Take a look and give me your feedback! |
Thanks so much @ramiromagno 🙏 I will start the review shortly since I believe @rammprasad is happy with the MR and will approve it shortly. Let's refrain from adding any new features in this MR and open a new issue/MR instead. |
Simply brilliant @ramiromagno, thank you so much 🙏 🙏 🙏 for all your time and effort. I am sure the SDTM community is going to appreciate this. I also feel that admiral might grab your idea of conditioned data frames data as well 😉 Huge thanks for the tests 💯 💯 💯 |
It looks good to me. Lets merge this to main, and I can take care of the documentation updates. |
@rammprasad and @edgar-manukyan : please do not merge yet as I am now doing styling and linting fixes. |
- No need for S3 methods to be exported - `condition_add()` now links to the appropriate article about conditioned data frames - Documentation tweaks - Version bump, NEWS update and pkgdown reference list update
- Add example for `condition_add()` - Re-export S3 methods for `cnd_df` - Update pkgdown reference list
Merge branch 'main' into 0054-condition-by # Conflicts: # NAMESPACE # NEWS.md # _pkgdown.yml # inst/WORDLIST # renv/profiles/4.4/renv.lock
Adds a new S3 class (cnd_df) for represented conditioned data frames, i.e. data frames that carry metadata about what records should be used for derivations
Adds support for basic pretty printing of cnd_df objects
Adds a user-facing function for creating such cnd_df objects:
condition_by
Adds experimental "mutate"-version function for these conditioned data frames:
derive_by_condition()
Thank you for your Pull Request! We have developed this task checklist from the
Development Process
Guide
to help with the final steps of the process. Completing the below tasks helps to
ensure our reviewers can maximize their time on your code as well as making sure
the oak codebase remains robust and consistent.
Please check off each taskbox as an acknowledgment that you completed the task
or check off that it is not relevant to your Pull Request. This checklist is
part of the Github Action workflows and the Pull Request will not be merged into
the
devel
branch until you have checked off each task.Request Title (Use Edit button in top-right if you need to update)
tidyverse style guide. Run
styler::style_file()
to style R and Rmd filesconsider realistic data scenarios and edge cases, e.g. empty datasets, errors,
boundary cases etc. - See
Unit Test Guide
fully follow the
deprecation guidance?
and families. Refer to the
categorization of functions to tag appropriate keyword/family.
devtools::document()
so all.Rd
files in theman
folder and theNAMESPACE
file in the project root are updated appropriatelyNEWS.md
if the changes pertain to a user-facing function (i.e. ithas an
@export
tag) or documentation aimed at users (rather than developers)pkgdown::build_site()
and check that all affectedexamples are displayed correctly and that all new functions occur on the "Reference" page.
lintr::lint_package()
R CMD check
locally and address all errors and warnings -devtools::check()