Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new feature: data_apply() #351

Open
DominiqueMakowski opened this issue Jan 15, 2023 · 13 comments
Open

new feature: data_apply() #351

DominiqueMakowski opened this issue Jan 15, 2023 · 13 comments

Comments

@DominiqueMakowski
Copy link
Member

Would be nice to have a small wrapper for sapply() to run on dataframes to do things like:

mtcars |>
  data_apply(as.factor, exclude="mpg")
@etiennebacher
Copy link
Member

Since it has a data_ prefix, I suppose it should only accept functions that return the same number of rows, like as.factor().

@strengejacke
Copy link
Member

What would be the use cases? Don't we have a lot of data functions that cover most situations? like to_factor()?

@strengejacke
Copy link
Member

Else, we still could do:

data[] <- data |>
  data_select() |>
  lapply(<fun>)

@DominiqueMakowski
Copy link
Member Author

Else, we still could do:

yes but that looks quite hacky

What would be the use cases?

mostly custom functions. basically same as sapply, but with a dataframe output + some selection helpers

@DominiqueMakowski
Copy link
Member Author

Else, we still could do:

Unless you mean that this would be the code of data_apply(), in that case yes

@strengejacke
Copy link
Member

yes but that looks quite hacky

Normal pipe workflow? 🤷

@etiennebacher
Copy link
Member

Else, we still could do:

data[] <- data |>
  data_select() |>
  lapply(<fun>)

Doesn't work (mpg shouldn't be factor):

library(datawizard)

foo <- mtcars[, 1:5]

foo[] <- foo |> 
  data_select(exclude = "mpg") |>  
  lapply(as.factor)

lapply(foo, class)
#> $mpg
#> [1] "factor"
#> 
#> $cyl
#> [1] "factor"
#> 
#> $disp
#> [1] "factor"
#> 
#> $hp
#> [1] "factor"
#> 
#> $drat
#> [1] "factor"

Can be done quite easily though:

library(datawizard)

foo <- mtcars[, 1:5]

to_convert <- foo |> 
  data_find(exclude = "mpg")

foo[to_convert] <- lapply(foo[to_convert], as.factor)

lapply(foo, class)
#> $mpg
#> [1] "numeric"
#> 
#> $cyl
#> [1] "factor"
#> 
#> $disp
#> [1] "factor"
#> 
#> $hp
#> [1] "factor"
#> 
#> $drat
#> [1] "factor"

@DominiqueMakowski
Copy link
Member Author

DominiqueMakowski commented Jan 16, 2023

data_apply <- function(data, fun, ...) {
  to_convert <- data_find(data, ...)
  data[to_convert] <- as.data.frame(lapply(data[to_convert], fun))
  data
}

It's a small addition to the codebase but quite convenient imo

@etiennebacher
Copy link
Member

Should this be an internal function in the package where you want to use it? Personally I'm a bit reluctant because it's getting quite close to dplyr::mutate() and dplyr::summarise() and The alternative takes two lines so I don't really see the point of making another function.

That said, @easystats/core-team what do you think?

@DominiqueMakowski
Copy link
Member Author

I was thinking of an external one as a combination of base's apply + selection

@rempsyc
Copy link
Member

rempsyc commented Jan 17, 2023

I still rely pretty heavily on dplyr personally so I don't really mind. Though, if the code is there already, I see little harm in adding a new function (except associated work with maintenance, testing, documentation).

@bwiernik
Copy link
Contributor

I'm not really understanding the use case here? There's pretty well-known and reasonably convenient base syntax like Daniel showed for developers, and map(), mutate(), and summarize() exist for end users

@bwiernik
Copy link
Contributor

As far as base goes, I don't see why one would use this (or sapply() in general) over apply(., 2) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants