Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor into proper package and use furrr package #121

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

pmayd
Copy link
Collaborator

@pmayd pmayd commented Jan 2, 2024

Closes #110

This PR makes use of devtools::install() to make a4d available as a package. With this, we can finally use furrr::future_map() without having to resource all R scripts because the functions are not known in the created sessions.

This code was tested under Mac and Windows (thanks @lboel ) and it seems to work fine. By my own measurements, we are at least 2x faster now and definitely using multiple threads.

Further optimizations and performance increases are possible, this is only the first draft that uses future_map for the main loops around tracker files

@pmayd pmayd requested review from KonradUdoHannes and lboel January 2, 2024 20:39
@pmayd
Copy link
Collaborator Author

pmayd commented Jan 2, 2024

@lboel feel free to add any comments/issues you see with the code or you noticed during your tests

@tvolegtv can you check out the branch and run it on your side as well just to make sure it works on your system, too?

@tvolegtv
Copy link
Collaborator

tvolegtv commented Jan 2, 2024

OK, I can do this on Friday, I'm on a short vacation now and without a “work” laptop

@pmayd
Copy link
Collaborator Author

pmayd commented Jan 2, 2024

No worries, such requests are never meant to be done right now, take your time ;)

lboel added 3 commits January 3, 2024 10:59
Run Styler on helper Main and script 3

Export function to convert csv to parquet for static clinic data
@lboel
Copy link
Collaborator

lboel commented Jan 3, 2024

I Updated the namespace to make the third script work.
sorry for the wrong styler commit (my default styler script that runs has not 4 indents but 2.

Copy link
Collaborator

@KonradUdoHannes KonradUdoHannes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me except for the plan switches, which right now could lead to memory stray processes in case the scripts throw an error for whatever reason, before the plans are switched back to sequential.

Comment on lines +14 to +15
devtools::uninstall()
devtools::install()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this relate to pre-commit, in particular if it runs last and cannot affect tests etc. Does this do anything at all wrt. to a git commit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I had a very hard time with the install command so maybe I am doing something wrong. I can successfully install the package exactly once at the beginning of a new session, but whenever I change something in one of the function and run it again, even with uninstall before it, I get a strange database error that some state or so is not consistent. So I think I added it here but I don't exactly remember why because yeah it is not needed for styler and also not for the tests or documentation....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I remember I wanted to make sure I have always the latest correct package installed. I first had this in the Rprofile file, but the start of Rstudio just takes too long with this so I switched back to only call devtools::load_all() at the start of a new session


clearLoggers()

plan("sequential")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems dangerous in case the script aborts due to an error. Its something that I would expect to be in the finally part of a tryCatch statement or equivalently hidden away in a with_* function.


clearLoggers()

plan("sequential")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern as for the other script 1.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is not really related to the PR so maybe we should address it separately, but generally I would suggest to move functionality into the package itself and keep the script shorter, similar to how script 1 and 2 changed. This would also make statements about test coverage inside the package more meaningful wrt. to the overall project. So the situation that tests pass, coverage is good, and scripts still fail is less likely.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally, I just wanted to leave this for later because there was nothing I could parallelize in script three so I saw it as unrelated to the current PR, as you were just saying. So maybe we can open another issue to streamline the third script and align it closer to the first two

readd sysdata.rda to R afet failing to use lazyload in data.
R/utils-pipe.R Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we export the magrittr pipe? Can we not use the native R pipe |> or is it a global/refactoring issue and we try to avoid having to replace all existing magrittr pipes with native pipes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly I support this I read this up and there are only very special cases where there is a difference so we should be fine with a global replace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Restructure everything to a package
4 participants