Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data_filter: Add support for loop indices within functions? #309

Open
rempsyc opened this issue Nov 6, 2022 · 3 comments
Open

data_filter: Add support for loop indices within functions? #309

rempsyc opened this issue Nov 6, 2022 · 3 comments

Comments

@rempsyc
Copy link
Member

rempsyc commented Nov 6, 2022

Still within #301, I wonder if it would make sense to add support for loop indices within functions for data_filter, @etiennebacher?

library(datawizard)

df1 <- data.frame(
  id = c(1, 2, 3, 1, 3),
  item1 = c(NA, 1, 1, 2, 3),
  item2 = c(NA, 1, 1, 2, 3),
  item3 = c(NA, 1, 1, 2, 3)
)

# Attempt 1
fun <- function(data, id) {
  min.index <- NULL
  for (i in unique(data[[id]])) {
    min.index[i] <- 2
    x <- data_filter(data, item3 == min.index[i])
  }
  x
}
fun(df1, id = "id")
#> Error: Filtering did not work. Please check the syntax of your `filter`
#>   argument.

# Attempt 2, using quotes
fun <- function(data, id) {
  min.index <- NULL
  for (i in unique(data[[id]])) {
    min.index[i] <- 2
    x <- data_filter(data, "item3 == min.index[i]")
  }
  x
}
fun(df1, id = "id")
#> Error: Filtering did not work. Please check the syntax of your `filter`
#>   argument.

# Attempt 3, using curly brackets
fun <- function(data, id) {
  min.index <- NULL
  for (i in unique(data[[id]])) {
    min.index[i] <- 2
    x <- data_filter(data, item3 == min.index[{i}])
  }
  x
}
fun(df1, id = "id")
#> Error: Filtering did not work. Please check the syntax of your `filter`
#>   argument.

# Workaround is to create the index manually first
fun <- function(data, id) {
  min.index <- NULL
  for (i in unique(data[[id]])) {
    min.index[i] <- 2
    index <- which(data$item3 == min.index[i])
    x <- data_filter(data, index)
  }
  x
}
fun(df1, id = "id")
#>   id item1 item2 item3
#> 4  1     2     2     2

Created on 2022-11-05 with reprex v2.0.2

@rempsyc rempsyc changed the title Add support for loop indices within functions? data_filter: Add support for loop indices within functions? Nov 6, 2022
@strengejacke
Copy link
Member

Yeah, .select_nse() works fine, but looks somehow "unmaintainable" due to its confusing complexity...

I'm not sure if in this particular case: data_filter(data, "item3 == min.index[i]"), it might be an issue of having the wrong environment when we evaluate the string? If so, there could be an "easy" solution, but these environment stuff, especially in combination with NSE, is still somewhat opaque to me.

@etiennebacher
Copy link
Member

The problem is that data_filter() tries to evaluate the condition directly, whereas here we would like to first evaluate min.index[i] to get its value, and then filter based on this value.

Currently, if the evaluation fails in data_filter(), we check if the expression contains some curly brackets, and if it doesn't then we throw an error. This kind of situation means that we would also need to evaluate the RHS of the condition before evaluating the condition itself. There could be a solution but I think we could end up with a very messy code, as in .select_nse().

@strengejacke what do you think?

@strengejacke
Copy link
Member

strengejacke commented Jun 16, 2023

I tried to debug this issue. I saw that in code line:

eval_symbol <- .dynEval(symbol, ifnotfound = NULL)

.dynEval() returns NULL for the expression item3 == min.index[i].

When it comes to subsetting:

datawizard/R/data_match.R

Lines 228 to 233 in 9b2e2b5

# filter data
out <- tryCatch(
subset(out, subset = eval(symbol, envir = new.env())),
warning = function(e) e,
error = function(e) e
)

symbol is item3 == min.index[i] and subset() errors at this point. Also simpler variants of the example-function do not work, like:

library(datawizard)

df1 <- data.frame(
  id = c(1, 2, 3, 1, 3),
  item1 = c(NA, 1, 1, 2, 3),
  item2 = c(NA, 1, 1, 2, 3),
  item3 = c(NA, 1, 1, 2, 3)
)

# Attempt 1
fun <- function(data, id) {
  min.index <- NULL
  for (i in unique(data[[id]])) {
    min.index <- 2
    x <- data_filter(data, item3 == min.index)
  }
  x
}
fun(df1, id = "id")
#> Error: Variable "min.index" was not found in the dataset.
#>   Possibly misspelled?

Created on 2023-06-16 with reprex v2.0.2

Not sure how/if we can solve this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants