Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow n() in data_modify() #535

Merged
merged 16 commits into from
Nov 21, 2024
Merged

Allow n() in data_modify() #535

merged 16 commits into from
Nov 21, 2024

Conversation

strengejacke
Copy link
Member

@strengejacke strengejacke commented Aug 27, 2024

@etiennebacher Maybe, if checks pass, you can merge this PR for the planned release? Just a minor code change.

Copy link
Member

@etiennebacher etiennebacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think there are cases where this implementation would fail, e.g. if someone passes 1:fun(). Can you check for this?

You have more usecases than me with datawizard but it seems that we gradually try to replace all functions from dplyr, which I thought was not the main objective of datawizard. I don't know if this needed in another package internally but if not, do we actually need to implement this instead of using dplyr?


Also, if you want to add more bug fixes/features, I don't plan on releasing before the end of the week.

NEWS.md Outdated Show resolved Hide resolved
R/data_modify.R Outdated Show resolved Hide resolved
R/data_modify.R Outdated Show resolved Hide resolved
tests/testthat/test-data_modify.R Outdated Show resolved Hide resolved
strengejacke and others added 3 commits August 27, 2024 20:22
Co-authored-by: Etienne Bacher <[email protected]>
Co-authored-by: Etienne Bacher <[email protected]>
@strengejacke
Copy link
Member Author

but if not, do we actually need to implement this

You're right, but we had this already for data_summary(), because it's an more often used function, and I thought it should then also work for data_modify(). That's the main reason why I added this. And actually, I don't use dplyr and tidyr anymore, since 99.5% of my tasks can be done with datawizard. :-)

Copy link
Member

@etiennebacher etiennebacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I have no good idea for a use-case for fun()?

I think you can just check that the error message of data_modify(mtcars, b = fun()) is comprehensible.

But before that, please check my other comment.

R/data_modify.R Show resolved Hide resolved
@etiennebacher
Copy link
Member

You're right, but we had this already for data_summary(), because it's an more often used function, and I thought it should then also work for data_modify().

I'm also not a fan of having this in data_summary(). I think it adds confusion for a tidyverse user that could think "why is n() supported but not first() or last()?" and basically we don't have a good answer to that. To me the tidyverse and datawizard are more complements than substitutes: datawizard provides nice statistical functions that can be included in a pipeline, and it also provides dependency-free functions that we need internally in other packages anyway. Adding support for n() doesn't seem to meet any of those cases.

@strengejacke
Copy link
Member Author

But I have no good idea for a use-case for fun()?

I think you can just check that the error message of data_modify(mtcars, b = fun()) is comprehensible.

But before that, please check my other comment.

But does fun() necessarily result in an error? I'm not sure whether there are use cases of user-defined functions?

@strengejacke
Copy link
Member Author

strengejacke commented Aug 28, 2024

datawizard provides nice statistical functions that can be included in a pipeline, and it also provides dependency-free functions that we need internally in other packages anyway. Adding support for n() doesn't seem to meet any of those cases.

Yes, that was certainly the initial idea of the package, but a) there are probably other people who use datawizard instead of dplyr/tidyr (e.g., @DominiqueMakowski - at least in the past - never really liked the NSE of tidyverse and prefers working with strings to identify variables etc., same like me) and b) maybe there will be other packages that prefer using datawizard for its minimal dependencies in their packages, and then it could be nice to have some features that are also supported by other packages. I personally don't mind implementing "common" features, but what do the other @easystats/maintainers think? Should we have a "stopping rule" of implementing features?

@etiennebacher
Copy link
Member

But does fun() necessarily result in an error? I'm not sure whether there are use cases of user-defined functions?

I just meant that we should have a test for the error below:

library(datawizard)

head(mtcars) |> 
  data_modify(x = n())
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb x
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 6
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 6
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 6
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1 6
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 6
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 6

head(mtcars) |> 
  data_modify(x = fun())
#> Error: There was an error in the first expression. Attempt to apply
#>   non-function. Possibly misspelled or not yet defined?

@strengejacke
Copy link
Member Author

@easystats/core-team Any opinions to
#535 (comment)
#535 (comment)

?

@rempsyc
Copy link
Member

rempsyc commented Sep 11, 2024

I personally don't mind implementing "common" features, but what do the other easystats/maintainers think?

Personally, I think the harm in adding extra features is limited (although I can also see Etienne's point about creating confusion). I think the potential benefit of new features ouweight the cons (although it also adds to future maintenance and development), so as long as you're motivated to do the associated work Daniel (and I think you are), I'd say go ahead 🥸

Sometimes in my packages I might want to internally rely on datawizard/easystats and I need dplyr just for one or two functions like n(), then it might be a plus that I have the option to rely on datawizard instead of adding an additional dependency (dplyr) or of rewriting the code myself.

@etiennebacher
Copy link
Member

So do we put this in 0.13.0?

@strengejacke
Copy link
Member Author

Yes, unless you feel the desperate need to merge it now.

@strengejacke
Copy link
Member Author

@easystats/core-team Any opinions to #535 (comment) #535 (comment)

?

@easystats/maintainers any further comments on this? :-)

@rempsyc
Copy link
Member

rempsyc commented Nov 20, 2024

Add it!

Copy link
Member

@etiennebacher etiennebacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like I'm the only one blocking this but I'm not a big datawizard user compared to you so feel free to merge this once the bullet point in NEWS has been put in the good section

@strengejacke
Copy link
Member Author

A package maintainer not using the packages (s)he maintains, that's also some kind of punk-rock! 😄
beavis-and-butthead-headbanging

@etiennebacher
Copy link
Member

I had some usecases for it when I started contributing but I'm full in tidyverse code for all my projects now and rarely rely on datawizard (mostly using the summary / codebook functions) 😅

@etiennebacher etiennebacher merged commit 9baa22b into main Nov 21, 2024
22 checks passed
@etiennebacher etiennebacher deleted the n_in_data_modify branch November 21, 2024 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants