Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Review #5

Open
jpiaskowski opened this issue Jan 23, 2022 · 1 comment
Open

Code Review #5

jpiaskowski opened this issue Jan 23, 2022 · 1 comment

Comments

@jpiaskowski
Copy link
Contributor

jpiaskowski commented Jan 23, 2022

Strengths:

  • Great use of roxygen functionality
  • the vignettes are very helpful
  • the functions well organized into individual files
  • pkgdown site very useful
  • all those wrappers for backwards compatibility of the functions will be useful
  • whole package is of great use to the program and worth the time invested to build it
  • code itself is largely clean, simple, easy to read and interpret. No massive cascades of nested functions, phew!
  • great package name - short, descriptive, funny

Things to change

  • For the function get_variety_db() in access_controlled_vocab.R, the default argument select_before = "2021-01-01" should probably changed to the date the user is accessing the file: select_before = Sys.Date() since they should grab all the data.
  • Specify in the appropriate place that %<% pipe should be imported from magrittr.

Areas for improvement:

  • Use fewer dependencies when possible. This reduces overall maintenance required to keep the package working. factcuratoR clearly requires readxl, fuzzyjoin, validate, and dm as part of its core functionality, so no need to consider removing those. However, some packages have slim usages and we may be able to consider refactoring the code so they can be removed: tibble, lubridate, readr, and stringr. We could also consider refactoring the code to remove dplyr, but that package is used extensively and perhaps not worth the time to refactor since this would be a large undertaking. I am unsure about tidyr and purrr. Overall though, this is only my best guess and I welcome input on this topic.

Moving forward, here some common replacements:

tidyverse function base R equivalent
._join() merge()
pivot_.() reshape()
case_when() ifelse()
group_by() aggregate()
map() lapply()
distinct() unique()
count() table()

These functions take more effort to use than tidyverse functions, but they are less prone to break the package since at this point they are rarely if every changed.

*Note: for lubridate::parse_date_time(), you can probably run As.Date() and try several formats you think are likely to work. In some cases (e.g., access_controlled_vocab.R), it might simplify the code.

  • An FYI on the term 'recursive': this is more than a function that loops or repeats itself. It is a function that calls itself -- a crazy concept to wrap your mind around. Recursive functions behave very oddly and can easily get caught in an infinite loop, which is why their usage is strongly discouraged across all programming languages.
  • In the file create_dm.R, there are a few for loops. It is absolutely not worth our time to fix this, but in the future, consider running apply() commands instead of for loops. These are much faster overall (but in our case, the time spent refactoring the code will vastly outstrip time saved in code execution). for loops are valuable when you really need the output from the previous loop to move forward (e.g. in MCMC), but if you don't, apply() will run the command in parallel.
  • Functions can use more error checking (that the user has supplied expected object types).
@justjacqueline
Copy link
Contributor

Thanks for the feedback, @jpiaskowski !

The "Things to change" have been fixed!

Other replies:
for create_dm.R, I am pretty sure I had to use a for loop because I was "modifying in place." I was having trouble with lapply() so I used a for loop based on what was in the old version Advanced R by Wickham: http://adv-r.had.co.nz/Functionals.html#functionals-not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants