Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: raw_filter and tgt_filter parameters #54

Closed
rammprasad opened this issue May 16, 2024 · 8 comments · Fixed by #55
Closed

Feature Request: raw_filter and tgt_filter parameters #54

rammprasad opened this issue May 16, 2024 · 8 comments · Fixed by #55
Assignees
Labels
enhancement New feature or request

Comments

@rammprasad
Copy link
Collaborator

Feature Idea

Introduce functionality to filter the raw and target datasets while performing a mapping.

Example If conditions

  1. Involving raw_dat

If [AESOS.AESO] == 1 and [AESOS.AESOSP] is null then hardcode OE.OEORRES = 'Y'

AESOS is the raw dataset and AESO, AESOSP are variables in the raw dataset. OE is the target domain and OEORRES is the target variable.
​​

hardcode_no_ct(
  raw_dat = AESOS,
  raw_var = AESO,
  tgt_var = OEORRES,
  tgt_val = "Y",
  tgt_dat = OE_INTER,
  raw_filter = (AESO == 1 && !is.na(AESOSP)),
  tgt_filter = NULL,
  id_vars =[ oak_id_vars](https://pharmaverse.github.io/sdtm.oak/reference/oak_id_vars.html)()
)

If [AESOS.AESO] == 1 and [AESOS.AESOSP] is null then hardcode OE.OETESTCD = 'IOISYMPO'

hardcode_ct(
  raw_dat = AESOS,
  raw_var = AETERM,
  tgt_var = OETESTCD,
  tgt_val = 'IOISYMPO',
  ct_spec = study_ct,
  ct_clst = "C123456",
  tgt_dat = NULL,
  raw_filter = (AESO == 1 && is.null(AESOSP)),
  tgt_filter = NULL,
  id_vars =[ oak_id_vars](https://pharmaverse.github.io/sdtm.oak/reference/oak_id_vars.html)()
)
  1. Involving tgt_dat

If VS.VSTESTCD = 'TEMP', assign the value collected in VTLS1.TEMPLOC to VS.VSLOC.

VTLS1 is the raw dataset name and TEMPLOC is a variable in the raw dataset. VS is the target domain and VSLOC is derived.

assign_ct(
  raw_dat = VTLS1,
  raw_var = "TEMPLOC",
  tgt_var = "VSLOC",
  ct_spec = study_ct,
  ct_clst = "C12123431",
  tgt_dat = vs_inter,
  raw_filter = NULL,
  tgt_filter = (VSTESTCD == "TEMP"),
  id_vars =[ oak_id_vars](https://pharmaverse.github.io/sdtm.oak/reference/oak_id_vars.html)()
)

Involving raw_dat and tgt_dat but separate conditions

If [AECOV19.SPECTYP] is not null, and FA.FATESTCD = 'STATUS' and FA.FAOBJ = 'Severe Acute Resp Syndrome Coronavirus 2' assign the value collected in SPCNM to then FA.FASPEC.

In this example AECOV19 is the raw dataset name, SPECTYP is a variable in the raw dataset. THe condition also involved the target domain FA, FAOBJ nad FATESTCD are previously derived SDTM variables and FASPEC is the SDTM variable that is currently derived.

assign_ct(
  raw_dat = AECOV19,
  raw_var = "SPCNM",
  tgt_var = "FASPEC",
  ct_spec = study_ct,
  ct_clst = "C1212121",
  tgt_dat = fa_inter,
  raw_filter = (is.null(SPECTYP)),
  tgt_filter = (FATESTCD == "STATUS" && FAOBJ  == "Severe Acute Resp Syndrome Coronavirus 2"),
  id_vars =[ oak_id_vars](https://pharmaverse.github.io/sdtm.oak/reference/oak_id_vars.html)()
)

Involving raw_dat and tgt_dat in the same condition
We may not be able to support this.

MH.MHLOC when MH.MHTERM = [GCAHX.NCITERM] or [GCAHX.NCITERMO]

Relevant Input

No response

Relevant Output

No response

Reproducible Example/Pseudo Code

No response

@rammprasad rammprasad added the enhancement New feature or request label May 16, 2024
@github-project-automation github-project-automation bot moved this to Product Backlog in sdtm.oak R package May 16, 2024
@ramiromagno
Copy link
Collaborator

As per discussion with @rammprasad I will implement instead a separate function to mark records in tibbles for filtering. The new function will be: condition_by().

@rammprasad
Copy link
Collaborator Author

As per discussion with @rammprasad I will implement instead a separate function to mark records in tibbles for filtering. The new function will be: condition_by().

Shall we name the function add_cond() or add_condition()?

@rammprasad
Copy link
Collaborator Author

rammprasad commented May 17, 2024

The example will look like below.

Example If conditions

Involving raw_dat

If [AESOS.AESO] == 1 and [AESOS.AESOSP] is null then hardcode OE.OEORRES = 'Y'

AESOS is the raw dataset and AESO, AESOSP are variables in the raw dataset. OE is the target domain and OEORRES is the target variable.
​​

hardcode_no_ct(
  raw_dat = add_cond(AESOS, AESO == 1 && !is.na(AESOSP)),
  raw_var = AESO,
  tgt_var = OEORRES,
  tgt_val = "Y",
  tgt_dat = OE_INTER,
  id_vars = oak_id_vars()
)

If [AESOS.AESO] == 1 and [AESOS.AESOSP] is null then hardcode OE.OETESTCD = 'IOISYMPO'

hardcode_ct(
  raw_dat = add_cond(AESOS, AESO == 1 && is.null(AESOSP)),
  raw_var = AETERM,
  tgt_var = OETESTCD,
  tgt_val = 'IOISYMPO',
  ct_spec = study_ct,
  ct_clst = "C123456",
 id_vars = oak_id_vars()
)

Involving tgt_dat

If VS.VSTESTCD = 'TEMP', assign the value collected in VTLS1.TEMPLOC to VS.VSLOC.

VTLS1 is the raw dataset name and TEMPLOC is a variable in the raw dataset. VS is the target domain and VSLOC is derived.

#when using in-pipe
|>
assign_ct(
  raw_dat = VTLS1,
  raw_var = "TEMPLOC",
  tgt_var = "VSLOC",
  ct_spec = study_ct,
  ct_clst = "C12123431",
  tgt_dat = add_cond(.data, VSTESTCD == "TEMP"),
  raw_filter = NULL,
  id_vars = oak_id_vars()
)

Involving raw_dat and tgt_dat but separate conditions

If [AECOV19.SPECTYP] is not null, and FA.FATESTCD = 'STATUS' and FA.FAOBJ = 'Severe Acute Resp Syndrome Coronavirus 2' assign the value collected in SPCNM to then FA.FASPEC.

In this example, AECOV19 is the raw dataset name, and SPECTYP is a variable in the raw dataset. The condition also involved the target domain FA. FAOBJ and FATESTCD are previously derived SDTM variables, and FASPEC is the SDTM variable that is currently derived.

#when using in-pipe
|>
assign_ct(
  raw_dat = add_cond(AECOV19,  is.null(SPECTYP)),
  raw_var = "SPCNM",
  tgt_var = "FASPEC",
  ct_spec = study_ct,
  ct_clst = "C1212121",
  tgt_dat = add_cond(.data,  FATESTCD == "STATUS" && FAOBJ  == "Severe Acute Resp Syndrome Coronavirus 2"),
  id_vars = oak_id_vars()
)

Involving raw_dat and tgt_dat in the same condition

We may not be able to support this. Take a look and let me know @ramiromagno

MH.MHLOC when MH.MHTERM = [GCAHX.NCITERM] or [GCAHX.NCITERMO]

Map the collected value in GCAHX raw_dat locat raw_varialble to MH.MHLOC when this condition is met MH.MHTERM = [GCAHX.NCITERM] or [GCAHX.NCITERMO]

#when using in-pipe
|>
assign_ct(
  raw_dat = GCAHX
  raw_var = "SPCNM",
  tgt_var = "FASPEC",
  ct_spec = study_ct,
  ct_clst = "C1212121",
  tgt_dat = add_cond(.data,  MHTERM %in% GCAHX$NCITERM || MHTERM %in% GCAHX$NCITERM O),
  id_vars = oak_id_vars()
)

@ramiromagno
Copy link
Collaborator

ramiromagno commented May 17, 2024

To help understand that use case involving variables of raw_dat and tgt_dat in the same condition, could you share how you currently do it with roak's if_then_else() interface?

@ramiromagno
Copy link
Collaborator

What should happen if the condition results in NA?

@ramiromagno ramiromagno self-assigned this May 18, 2024
@rammprasad
Copy link
Collaborator Author

rammprasad commented May 21, 2024

To help understand that use case involving variables of raw_dat and tgt_dat in the same condition, could you share how you currently do it with roak's if_then_else() interface?

The {roak} processes it very differently, and it is driven by metadata. The main branch has an example. Please refer to the example mapping CMMODIFTY with the annotation text If different to CM.CMTRT, then CM.CMMODIFY means the mapping will happen if the value in the collected column CMMODIFY is different from the CMTRT. It is carried out using the spec parameters condition_left, condition_right, and condition_operator. {roak} reads it and processes the logical condition. it is a bit confusing as at the moment the name of the variable in CMMODIFY in the raw_dataset and in the target domain CM. I will change it in the raw_dataset

A mock of automation of this in {roak} will look like

 # Derive qualifier CMMODIFY  Annotation text = If different to CM.CMTRT then CM.CMMODIFY
  if_then_else(
    raw_dat = cm_raw,
    raw_var = CMMODIFY,
    condition_left_raw_dataset = cm_raw,
    condition_left_raw_variable = CMMODIFY,
    condition_operator = "diffferent_to",
    condition_right_sdtm_variable_domain = CM,
    condition_right_sdtm_variable = CMTRT,
    sub_algorithm = assign_no_ct,
    tgt_var = CMDOSETXT,
    id_vars = oak_id_vars()
  ) |>

Can we do something like this in {sdtm.oak} where filtering needs to happen based on a condition in raw_dat and tar_dat?

 # Derive qualifier CMMODIFY  Annotation text  If collected value in CMMODIFY in cm_raw is different to CM.CMTRT then
  # assign the collected value to CMMODIFY in the CM domain (CM.CMMODIFY)
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "CMMODIFY",
    add_cond = (cm_raw$CMMODIFY == .data$CMTRT),
    tgt_var = "CMMODIFY",
    id_vars = oak_id_vars()
  )

@rammprasad
Copy link
Collaborator Author

What should happen if the condition results in NA?

If no records match the criteria, we create the tgt_var as an empty column.

@rammprasad
Copy link
Collaborator Author

Preferred option to handle complex if condition.

#when using in-pipe
|>
assign_ct(
raw_dat = GCAHX
raw_var = "SPCNM",
tgt_var = "FASPEC",
ct_spec = study_ct,
ct_clst = "C1212121",
tgt_dat = add_cond(.data$MHTERM %in% GCAHX$NCITERM || .data$MHTERM %in% GCAHX$NCITERMO),
id_vars = oak_id_vars()
)

@ramiromagno ramiromagno linked a pull request May 24, 2024 that will close this issue
14 tasks
ramiromagno added a commit that referenced this issue May 29, 2024
- Joins by raw and target data sets are now aware of conditioned tibbles
- Transformation functions, namely `assign_datetime()`, `hardcode*()` and `assign*` are also conditioned-tibble aware
- Unit test coverage for most cases indicated at #54

I believe the essential components are here to support the if_then_else algorithm via conditioned tibbles. Now, further testing, assertions and documentation is needed.
@ramiromagno ramiromagno moved this from Product Backlog to In Progress in sdtm.oak R package May 29, 2024
@ramiromagno ramiromagno moved this from In Progress to In review in sdtm.oak R package Jun 17, 2024
ramiromagno added a commit that referenced this issue Jun 18, 2024
* Basic support for "conditioned" data frames

- Adds a new S3 class (cnd_df) for represented conditioned data frames, i.e. data frames that carry metadata about what records should be used for derivations

- Adds support for basic pretty printing of cnd_df objects

- Adds a user-facing function for creating such cnd_df objects: `condition_by`

- Adds experimental "mutate"-version function for these conditioned data frames: `derive_by_condition()`

* Basic support for conditioned data sets

* Extensive support for conditioned tibbles

- Joins by raw and target data sets are now aware of conditioned tibbles
- Transformation functions, namely `assign_datetime()`, `hardcode*()` and `assign*` are also conditioned-tibble aware
- Unit test coverage for most cases indicated at #54

I believe the essential components are here to support the if_then_else algorithm via conditioned tibbles. Now, further testing, assertions and documentation is needed.

* Ramm's feedback integration

- Move `tgt_dat` to the first position in the argument list for cleaner command pipes.

- Rename `condition_by()` to `condition_add()`.

- Export `oak_id_vars()` for direct user access.

- Update tidyselections to align with the latest practices.

* Update on conditioned data frames

- Documentation
- Examples
- New article about cnd_df (WIP)

* Styling fixes

* Update linting and styling

* Tidying up

- No need for S3 methods to be exported
- `condition_add()` now links to the appropriate article about conditioned data frames
- Documentation tweaks
- Version bump, NEWS update and pkgdown reference list update

* Last tweaks

- Add example for `condition_add()`
- Re-export S3 methods for `cnd_df`
- Update pkgdown reference list

* Remove blank line

* Tweaks to `%.>%` docs

* Automatic renv profile update.

---------

Authored-by: ramiromagno <[email protected]>
rammprasad added a commit that referenced this issue Jun 20, 2024
* New function to derive oak_id_vars. More work needed.

* cm template updates. In progress.

* almost completed cm template

* Basic support for "conditioned" data frames

- Adds a new S3 class (cnd_df) for represented conditioned data frames, i.e. data frames that carry metadata about what records should be used for derivations

- Adds support for basic pretty printing of cnd_df objects

- Adds a user-facing function for creating such cnd_df objects: `condition_by`

- Adds experimental "mutate"-version function for these conditioned data frames: `derive_by_condition()`

* Basic support for conditioned data sets

* Extensive support for conditioned tibbles

- Joins by raw and target data sets are now aware of conditioned tibbles
- Transformation functions, namely `assign_datetime()`, `hardcode*()` and `assign*` are also conditioned-tibble aware
- Unit test coverage for most cases indicated at #54

I believe the essential components are here to support the if_then_else algorithm via conditioned tibbles. Now, further testing, assertions and documentation is needed.

* Ramm's feedback integration

- Move `tgt_dat` to the first position in the argument list for cleaner command pipes.

- Rename `condition_by()` to `condition_add()`.

- Export `oak_id_vars()` for direct user access.

- Update tidyselections to align with the latest practices.

* A fix to derive study day

* Algorithms Vignette update

* cm template code update

* A function to help display of dataset in Vignette

* Template update

* Raw data change

* DM domain csv

* Events domain article

* update controlled terminology

* Updated CM template

* VS domain template and Vignette

* CM domain Vignette update

* Update on conditioned data frames

- Documentation
- Examples
- New article about cnd_df (WIP)

* Fix Vignette

* Updates to code

* Remove white spaces

* remove white spaces

* clean up

* pipeline failures

* Fix pipeline failures

* Automatic renv profile update.

* Automatic renv profile update.

* Fix pipeline failures

* Fix pipeline failure

* Moving DT from suggests to imports

* Update WORDLIST

* fix spelling

---------

Co-authored-by: Rammprasad Ganapathy <[email protected]>
Co-authored-by: Ramiro Magno <[email protected]>
Co-authored-by: rammprasad <[email protected]>
@ramiromagno ramiromagno moved this from In review to Done in sdtm.oak R package Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

Successfully merging a pull request may close this issue.

2 participants