Releases: ropensci/drake
Releases · ropensci/drake
Down with drake_config()!
Version 7.10.0
Unavoidable but minor breaking changes
These changes invalidate some targets in some workflows, but they are necessary bug fixes.
- Remove spurious local variables detected in
$<-()
and@<-()
(#1144). - Avoid target names with trailing dots (#1147, @plebejer).
Bug fixes
- Handle unequal list columns in
bind_plans()
(#1136, @jennysjaarda). - Handle non-vector sub-targets in dynamic branching (#1138).
- Handle calls in
analyze_assign()
(#1119, @jennysjaarda). - Restore correct environment locking (#1143, @kuriwaki).
- Log
"running"
progress of dynamic targets. - Log dynamic targets as failed if a sub-target fails (#1158).
New features
- Add a new
"fst_tbl"
format for largetibble
targets (#1154, @kendonB). - Add a new
format
argument tomake()
, an optional custom storage format for targets without an explicittarget(format = ...)
in the plan (#1124). - Add a new
lock_cache
argument tomake()
to optionally suppress cache locking (#1129). (It can be annoying to interruptmake()
repeatedly and unlock the cache manually every time.) - Add new functions
cancel()
andcancel_if()
function to cancel targets mid-build (#1131). - Add a new
subtarget_list
argument toloadd()
andreadd()
to optionally load a dynamic target as a list of sub-targets (#1139, @MilesMcBain). - Prohibit dynamic
file_out()
(#1141).
Enhancements
- Check for illegal formats early on at the
drake_config()
level (#1156, @MilesMcBain). - Smoothly deprecate the
config
argument in all user-side functions (#1118, @vkehayas). Users can now supply the plan and othermake()
arguments directly, without bothering withdrake_config()
. Now, you only need to calldrake_config()
in the_drake.R
file forr_make()
and friends. Old code withconfig
objects should still work. Affected functions:make()
outdated()
drake_build()
drake_debug()
recoverable()
missed()
deps_target()
deps_profile()
drake_graph_info()
vis_drake_graph()
sankey_drake_graph()
drake_graph()
text_drake_graph()
predict_runtime()
. Needed to rename thetargets
argument totargets_predict
andjobs
tojobs_predict
.predict_workers()
. Same argument name changes aspredict_runtime()
.
- Because of #1118, the only remaining user-side purpose of
drake_config()
is to serve functionsr_make()
and friends. - Document the limitations of grouping variables (#1128).
- Handle the
@
operator. For example, in the static code analysis ofx@y
, do not registery
as a dependency (#1130, @famuvie). - Remove superfluous/incorrect information about imports from the output of
deps_profile()
(#1134, @kendonB). - Append hashes to
deps_target()
output (#1134, @kendonB). - Add S3 class and pretty print method for
drake_meta_()
objects objects. - Use call stacks instead of environment inheritance to power
drake_envir()
andid_chr()
(#1132). - Allow
drake_envir()
to select the environment with imports (#882). - Improve visualization labels for dynamic targets: clarify that the listed runtime is a total runtime over all sub-targets and list the number of sub-targets.
Speedups and better dynamic branching
Version 7.9.0
Breaking changes in dynamic branching
- Embrace the
vctrs
paradigm and its type stability for dynamic branching (#1105, #1106). - Accept
target
as a symbol by default inread_trace()
. Required for the trace to make sense in #1107.
Bug fixes
- Repair reference to custom HPC resources in the
"future"
backend (#1083, @jennysjaarda). - Properly copy data when importing targets from one cache into another (#1120, @brendanf).
- Prevent dynamic vector sizes from conflicting with file sizes in metadata.
New features
- Add a new
log_build_times
argument tomake()
anddrake_config()
. Allows users to disable the recording of build times. Produces a speedup of up to 20% on Macs (#1078). - Implement cache locking to prohibit concurrent calls to
make()
,outdated(make_imports = TRUE)
,recoverable(make_imports = TRUE)
,vis_drake_graph(make_imports = TRUE)
,clean()
, etc. on the same cache. - Add a new
format
trigger to invalidate targets when the specialized data format changes (#1104, @kendonB). - Add new functions
cache_planned()
andcache_unplanned()
to help selectively clean workflows with dynamic targets (#1110, @kendonB). - Add S3 classes and pretty print methods for
drake_config()
objects andanalyze_code()
objects. - Add a new
"qs"
format (#1121, @kendonB).
Speedups
- Avoid setting seeds for imports (#1086, @adamkski).
- Avoid working directly with POSIXct times (#1086, @adamkski)
- Avoid excessive calls to
%||%
(%|||%
is faster). (#1089, @billdenney) - Remove
%||NA
due to slowness (#1089, @billdenney). - Use hash tables to speed up
is_dynamic()
andis_subtarget()
(#1089, @billdenney). - Use
getVDigest()
instead ofdigest()
(#1089, #1092, eddelbuettel/digest#139 (comment), @eddelbuettel, @billdenney). - Pre-compute
backtick
and.deparseOpts()
to speed updeparse()
(#1086,https://stackoverflow.com/users/516548/g-grothendieck
, @adamkski). - Pre-compute which targets exist in advance (#1095).
- Avoid gratuitous cache interactions and data frame operations in
build_times()
(#1098). - Use
mget_hash()
inprogress()
(#1098). - Get target progress info only once in
drake_graph_info()
(#1098). - Speed up the retrieval of old metadata in
outdated()
(#1098). - In
make()
, avoid checking for nonexistent metadata for missing targets. - Reduce logging in
drake_config()
.
Enhancements
- Write a complete project structure in
use_drake()
(#1097, @lorenzwalthert, @tjmahr). - Add a minor logger note to say how many dynamic sub-targets are registered at a time (#1102, @kendonB).
- Handle dependencies that are dynamic targets but not declared as such for the current target (#1107).
- Internally, the "layout" data structure is now called the "workflow specification", or "spec" for short. The spec is
drake
's interpretation of the plan. In the plan, all the dependency relationships among targets and files are implicit. In the spec, they are all explicit. We get from the plan to the spec using static code analysis, e.g.analyze_code()
.
Dynamic branching
Version 7.8.0
Bug fixes
- Prevent
drake::drake_plan(x = target(...))
from throwing an error ifdrake
is not loaded (#1039, @mstr3336). - Move the
transformations
lifecycle badge to the proper location in the docstring (#1040, @jeroen). - Prevent
readd()
/loadd()
from turning an imported function into a target (#1067). - Align in-memory
disk.frame
targets with their stored values (#1077, @brendanf).
New features
- Implement dynamic branching (#685).
- Add a new
subtargets()
function to get the cached names of the sub-targets of a dynamic target. - Add new
subtargets
arguments toloadd()
andreadd()
to retrieve specific sub-targets from a parent dynamic target. - Add new
get_trace()
andread_trace()
functions to help track which values of grouping variables go into the making of dynamic sub-targets. - Add a new
id_chr()
function to get the name of the target whilemake()
is running. - Implement
plot(plan)
(#1036). vis_drake_graph()
,drake_graph_info()
, andrender_drake_graph()
now
take arguments that allow behavior to be defined upon selection of nodes. (#1031, @mstr3336).- Add a new
max_expand
argument tomake()
anddrake_config()
to scale down dynamic branching (#1050, @hansvancalster).
Enhancements
- Document transformation functions in a way that avoids having to create true functions (#979).
- Avoid always invalidating the memoized layout when we set the knitr hash.
- Change the names of environments in
drake_config()
objects. - Assert that
prework
is a language object, list of language objects, or character vector (#1 at pat-s/multicore-debugging on GitHub, @pat-s). - Use an environment instead of a list for
config$layout
. Supports internal modifications by reference. Required for #685. - Clean up the code of the parallel backends.
- Make
dynamic
a formal argument oftarget()
. - Always lock/unlock the environment target by target, allowing informative error messages to appear more readily (#1062, @PedramNavid)
- Automatically ignore
storr
s and decoratedstorr
s (#1071). - Speed up memory management by avoiding a call to
setdiff()
and avoidingnames(config$envir_targets)
.
disk.frame and code_to_function()
Version 7.7.0
Bug fixes
- Take the sum instead of the max in
dir_size()
. Incurs rehashing for some workflows, but should not invalidate any targets.
New features
- Add a new
which_clean()
function to preview which targets will be invalidated byclean()
(#1014, @pat-s). - Add serious import and export methods for the decorated
storr
(#1015, @billdenney, @noamross). - Add a new
"diskframe"
format for larger-than-memory data (#1004, @xiaodaigh). - Add a new
drake_tempfile()
function to help with"diskframe"
format. It makes sure we are not copying large datasets across different physical storage media (#1004, @xiaodaigh). - Add new function
code_to_function()
to allow for parsing script based workflows into functions sodrake_plan()
can begin to manage the workflow and track dependencies. (#994, @thebioengineer)
Continuing with efficient data formats
Version 7.6.2
Bug fixes
- Remove README.md from CRAN altogether. Also remove all links from the news and vignette. The links trigger too many CRAN notes, which made the automated checks too brittle.
- Serialize formats that need serialization (like "keras") before sending the data from HPC workers to the master process (#989).
- Check for custom-formatted files when checking checksums.
- Force fst-formatted targets to plain data frames. Same goes for the new "fst_dt" format.
- Change the meaning and behavior of
max_expand
indrake_plan()
.max_expand
is now the maximum number of targets produced bymap()
,split()
, andcross()
. Forcross()
, this reduces the number of targets (less cumbersome) and makes the subsample of targets more representative of the complete grid. It also. ensures consistent target naming when.id
isFALSE
(#1002). Note:max_expand
is not for production workflows anyway, so this change does not break anything important. Unfortunately, we do lose the speed boost indrake_plan()
originally due tomax_expand
, butdrake_plan()
is still fast, so that is not so bad. - Drop specialized formats of
NULL
targets (#998). - Prevent false grouping variables from partially tagging along in
cross()
(#1009). The same fix should apply tomap()
andsplit()
too. - Respect graph topology when recovering old grouping variables for
map()
(#1010).
New features
- Add a new "fst_dt" format for
fst
-powered saving ofdata.table
objects. - Support a custom "caching" column of the plan to select master vs worker caching for each target individually (#988).
- Make
transform
a formal argument oftarget()
so that users do not have to type "transform =" all the time indrake_plan()
(#993). - Migrate the documentation website from
ropensci.github.io/drake
todocs.ropensci.org/drake
.
Enhancements
- Document the HPC limitations of
target(format = "keras")
(#989). - Remove the now-superfluous vignette.
- Wrap up console and text file logging functionality into a reference class (#964).
- Deprecate the
verbose
argument in various caching functions. The location of the cache is now only printed inmake()
. This made the previous feature easier to implement. - Carry forward nested grouping variables in
combine()
(#1008). - Improve the encapsulation of hash tables in the decorated
storr
(#968).
CRAN hotfix
Fix broken README links.
Big data formats
Version 7.6.0
New features
- Support specialized data storage via a decorated cache and
format
argument oftarget()
(#971). This allows users to leverage faster ways to save and load targets, such aswrite_fst()
for data frames andsave_model_hdf5()
for Keras models. It also improves memory because it preventsstorr
from making a serialized in-memory copy of large data objects. - Add
tidyselect
functionality for...
inprogress()
, analogous toloadd()
,build_times()
, andclean()
. - Support S3 for user-defined generics (#959). If the generic
do_stuff()
and the methodstuff.your_class()
are defined inenvir
, and ifdo_stuff()
has a call toUseMethod("stuff")
, thendrake
's code analysis will detectstuff.your_class()
as a dependency ofdo_stuff()
. - Add authentication support for
file_in()
URLs. Requires the newcurl_handles
argument ofmake()
anddrake_config()
(#981).
Bug fixes
- Make
drake_plan(transform = slice())
understand.id
and grouping variables (#963). - Repair
clean(garbage_collection = TRUE, destroy = TRUE)
. Previously it destroyed the cache before trying to collect garbage. - Ensure that
r_make()
passes informative error messages back to the calling process (#969). - Avoid downloading full contents of URLs when rehashing (#982)
- Retain upstream grouping variables of
map()
andcross()
on topologically side-by-side targets (#983). - Manually enforce the correct ordering in
dsl_left_outer_join()
socross()
selects the right combinations of existing targets (#986). This bug was probably introduced in the solution to #983. - Make the output of
progress()
more consistent, less dependent on whethertidyselect
is installed.
Enhancements
- Document DSL keywords as if they were true functions:
target()
,map()
,split()
,cross()
, andcombine()
(#979). - Do garbage collection between the unloading and loading phases of memory management.
- Keep
file_out()
files inclean()
unlessgarbage_collection
isTRUE
. That way,make(recover = TRUE)
is a true "undo button" forclean()
.clean(garbage_collection = TRUE)
still removes data in the cache, as well as anyfile_out()
files from targets currently being cleaned. - The menu in
clean()
only appears ifgarbage_collection
isTRUE
. Also, this menu is added torescue_cache(garbage_collection = TRUE)
. - Reorganize the internal code files and functions to make development easier.
- Move the history inside the cache folder
.drake/
. The old.drake_history/
folder was awkward. Old histories are migrated duringdrake_config()
, anddrake_history()
. - Add lifecycle badges to exported functions.
CRAN hotfix
- Eliminate accidental creations of
.drake_history/
inplan_to_code()
,plan_to_notebook()
, and the help file examples. Should fix the note at https://win-builder.r-project.org/incoming_pretest/drake_7.5.1_20190721_153755/Debian/00check.log. - Repair long examples.
History, provenance, and recovery
Version 7.5.0
New features
- Add automated data recovery (#945). This is still experimental and disabled by default. Requires
make(recover = TRUE)
. - Add new functions
recoverable()
andr_recoverable()
to show targets that are outdated but recoverable viamake(recover = TRUE)
. - Track the history and provenance of targets, viewable with
drake_history()
. Powered bytxtq
(#918, #920). - Add a new
no_deps()
function, similar toignore()
.no_deps()
suppresses dependency detection but still tracks changes to the literal code (#910). - Add a new "autoclean" memory strategy (#917).
- Export
transform_plan()
. - Allow a custom
seed
column ofdrake
plans to set custom seeds (#947). - Add a new
seed
trigger to optionally ignore changes to the target seed (#947).
Enhancements
- In
drake_plan()
, interpret custom columns as non-language objects (#942). - Suggest and assert
clustermq
>= 0.8.8. - Log the target name in a special column in the console log file (#909).
- Rename the "memory" memory strategy to "preclean" (with deprecation; #917).
- Deprecate
ensure_workers
indrake_config()
andmake()
. - Warn when the user supplies additional arguments to
make()
afterconfig
is already supplied. - Prevent users from running
make()
from inside the cache (#927). - Add
CITATION
file with JOSS paper. - In
deps_profile()
, include the seed and change the names. - Allow the user to set a different seed in
make()
. All this does is invalidate old targets. - Use
set_hash()
andget_hash()
instorr
to double the speed of progress tracking.
Bug fixes
Data splitting, and URL tracking, and advanced memory management
Version 7.4.0
Mildly breaking changes
These changes are technically breaking changes, but they should only affect advanced users.
rescue_cache()
no longer returns a value.
Bug fixes
- Restore compatibility with
clustermq
(#898). Suggest version >= 0.8.8 but allow 0.8.7 as well. - Ensure
drake
recomputesconfig$layout
whenknitr
reports change (#887). - Do not rehash large imported files every
make()
(#878). - Repair parsing of long tidy eval inputs in the DSL (#878).
- Clear up cache confusion when a custom cache exists adjacent to the default cache (#883).
- Accept targets as symbols in
r_drake_build()
. - Log progress during
r_make()
(#889). - Repair
expose_imports()
: do not do theenvironment<-
trick unless the object is a non-primitive function. - Use different static analyses of
assign()
vsdelayedAssign()
. - Fix a superfluous code analysis warning incurred by multiple
file_in()
files and other strings (#896). - Make
ignore()
work insideloadd()
,readd()
,file_in()
,file_out()
, andknitr_in()
.
New features
- Add experimental support for URLs in
file_in()
andfile_out()
.drake
now treatsfile_in()
/file_out()
files as URLS if they begin with "http://", "https://", or "ftp://". The fingerprint is a concatenation of the ETag and last-modified timestamp. If neither can be found or if there is no internet connection,drake
throws an error. - Implement new memory management strategies
"unload"
and"none"
, which do not attempt to load a target's dependencies from memory (#897). - Allow users to give each target its own memory strategy (#897).
- Add
drake_slice()
to help split data across multiple targets. Related: #77, #685, #833. - Introduce a new
drake_cache()
function, which is now recommended instead ofget_cache()
(#883). - Introduce a new
r_deps_target()
function. - Add RStudio addins for
r_make()
,r_vis_drake_graph()
, andr_outdated()
(#892).