You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a project with two kinds of R files: "scripts" (which start from the CLI and run the analysis) and "definitions" where I only define functions.
I would like for each function to log in an appropriate logger, for example train_model would log to model/train logger and forecast_model would log to model/forecast and so on.
These methods run thousands of times and I'm a bit curious about the performance. Would it be appropriate to call get_logger at the beginning of the method, storing the logger reference to a local variable, and using that throughout the function?
Or should I have a global object in the environment instead? This seems to be a bit inferior in that name clashes etc. would need to be handled globally ("no lg object for entire program"). Since the definition files are loaded in unspecified order, I feel like the global definitions could easily shadow each other and create a mess (as far as I know there is no such thing as "file local" variable scope in R).
Maybe this is a case of premature optimization, but having no experience with the python logging module which is the inspiration, I'm unsure about the best practices.
It would be also good to have some discussion about this in the readme, e.g. "how to use this package efficiently in a mid/large size project".
Thanks!
The text was updated successfully, but these errors were encountered:
Python logging (on which lgr is based) recommends having a get_logger() call in each function (logging.getLogger() in python), even within the same python module. I personally always define a single logger object for my R-Packages though that is more out of convenience... If you are worried about namespace clashes from sourcing several definition files, using lgr::get_logger() in each function seems appropriate.
I just ran a small benchmark lgr::get_logger() and the overhead from get_logger() is in the range of nanoseconds, so it can be safely ignored. If you are worried about performance, I would recommend just bench-marking yourself using the bench package. Doesn't take long to set up and you have a definite answer.
Example benchmark:
library(lgr)
# install.packages("bench")
# create a logger that logs to the memmory and does not
# propagate to the root logger
lgr::get_logger("foo/bar")$
add_appender(AppenderBuffer$new(buffer_size = 1e5))$
set_propagate(FALSE)
lg_global <- get_logger("foo/bar")
local_logger <- function(){
lg <- get_logger("foo/bar")
lg$info("blah")
}
global_logger <- function(){
lg_global$info("blah")
}
r <- bench::mark(
local_logger,
global_logger,
n = 100000,
check = FALSE
)
print(r)
I'm leaving this open because I think this is a good question for adding it to the FAQ section of the documentation
P.s: You are right, there is no such thing as file-local variables in base R, but if you want a cleaner separation of namespaces for your definitions.R files without creating a dedicated R-package, check out the modules package.
Great, thanks for the very informative answer. I figured it would be fast, but nanoseconds is definitely good to hear! :D
I think at some point I will have to dive into packages as our project is growing out of proportions. I'll check out the modules package to see if it can be used for the meanwhile.
I have a project with two kinds of R files: "scripts" (which start from the CLI and run the analysis) and "definitions" where I only define functions.
I would like for each function to log in an appropriate logger, for example
train_model
would log tomodel/train
logger andforecast_model
would log tomodel/forecast
and so on.These methods run thousands of times and I'm a bit curious about the performance. Would it be appropriate to call
get_logger
at the beginning of the method, storing the logger reference to a local variable, and using that throughout the function?Or should I have a global object in the environment instead? This seems to be a bit inferior in that name clashes etc. would need to be handled globally ("no
lg
object for entire program"). Since the definition files are loaded in unspecified order, I feel like the global definitions could easily shadow each other and create a mess (as far as I know there is no such thing as "file local" variable scope in R).Maybe this is a case of premature optimization, but having no experience with the python logging module which is the inspiration, I'm unsure about the best practices.
It would be also good to have some discussion about this in the readme, e.g. "how to use this package efficiently in a mid/large size project".
Thanks!
The text was updated successfully, but these errors were encountered: