-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor into proper package and use furrr package #121
base: develop
Are you sure you want to change the base?
Conversation
…AidSwitzerland/a4d into refactor/#110-make-proper-package
OK, I can do this on Friday, I'm on a short vacation now and without a “work” laptop |
No worries, such requests are never meant to be done right now, take your time ;) |
Run Styler on helper Main and script 3 Export function to convert csv to parquet for static clinic data
I Updated the namespace to make the third script work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me except for the plan
switches, which right now could lead to memory stray processes in case the scripts throw an error for whatever reason, before the plans are switched back to sequential.
devtools::uninstall() | ||
devtools::install() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this relate to pre-commit, in particular if it runs last and cannot affect tests etc. Does this do anything at all wrt. to a git commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I had a very hard time with the install command so maybe I am doing something wrong. I can successfully install the package exactly once at the beginning of a new session, but whenever I change something in one of the function and run it again, even with uninstall before it, I get a strange database error that some state or so is not consistent. So I think I added it here but I don't exactly remember why because yeah it is not needed for styler and also not for the tests or documentation....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I remember I wanted to make sure I have always the latest correct package installed. I first had this in the Rprofile file, but the start of Rstudio just takes too long with this so I switched back to only call devtools::load_all() at the start of a new session
|
||
clearLoggers() | ||
|
||
plan("sequential") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems dangerous in case the script aborts due to an error. Its something that I would expect to be in the finally
part of a tryCatch
statement or equivalently hidden away in a with_*
function.
|
||
clearLoggers() | ||
|
||
plan("sequential") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concern as for the other script 1.
scripts/run_script_3_create_tables.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is not really related to the PR so maybe we should address it separately, but generally I would suggest to move functionality into the package itself and keep the script shorter, similar to how script 1 and 2 changed. This would also make statements about test coverage inside the package more meaningful wrt. to the overall project. So the situation that tests pass, coverage is good, and scripts still fail is less likely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally, I just wanted to leave this for later because there was nothing I could parallelize in script three so I saw it as unrelated to the current PR, as you were just saying. So maybe we can open another issue to streamline the third script and align it closer to the first two
readd sysdata.rda to R afet failing to use lazyload in data.
R/utils-pipe.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we export the magrittr pipe? Can we not use the native R pipe |>
or is it a global/refactoring issue and we try to avoid having to replace all existing magrittr pipes with native pipes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly I support this I read this up and there are only very special cases where there is a difference so we should be fine with a global replace
Closes #110
This PR makes use of
devtools::install()
to makea4d
available as a package. With this, we can finally usefurrr::future_map()
without having to resource all R scripts because the functions are not known in the created sessions.This code was tested under Mac and Windows (thanks @lboel ) and it seems to work fine. By my own measurements, we are at least 2x faster now and definitely using multiple threads.
Further optimizations and performance increases are possible, this is only the first draft that uses
future_map
for the main loops around tracker files