-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
121 lines (92 loc) · 3.42 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# filecacher
<!-- badges: start -->
[![](https://cranlogs.r-pkg.org/badges/filecacher)](https://cran.r-project.org/package=filecacher)
<!-- badges: end -->
The main functions in this package are:
1. `with_cache()`: Caches the expression in a local file on disk, using `cachem::cache_disk()` as its backend. This can be comfortably added to a piped sequence and it handles evaluating if the element doesn't already exist, or pulling from the cache if it does.
1. `cached_read()`: A wrapper around a typical read function that caches the result and the file list info using `cachem::cache_disk()`. If the input file list info hasn't changed (including date modified), the cache file will be read. This can save time if the original operation requires reading from many files, or involves lots of processing.
See examples below.
## Installation
You can install the released version of `filecacher` from
[CRAN](https://cran.r-project.org/package=filecacher) with:
``` r
install.packages("filecacher")
```
And the development version from [GitHub](https://github.com/orgadish/filecacher):
``` r
if(!requireNamespace("remotes")) install.packages("remotes")
remotes::install_github("orgadish/filecacher")
```
## Example
```{r example}
# Example files: iris table split by species into three files.
iris_files_by_species <- list.files(
system.file("extdata", package = "filecacher"),
pattern = "_only[.]csv$",
full.names = TRUE
)
basename(iris_files_by_species)
# Create a temporary directory to run these examples.
tf <- withr::local_tempfile()
dir.create(tf)
something_that_takes_a_while <- function(x) {
Sys.sleep(0.5)
return(x)
}
# Example standard pipeline without caching:
# 1. Read using a vectorized `read.csv`.
# 2. Perform some custom processing that takes a while (currently using sleep as an example).
normal_pipeline <- function(files, cache_dir = NULL) {
files |>
filecacher::vectorize_reader(read.csv)() |>
suppressMessages() |>
something_that_takes_a_while()
}
# Same pipeline, using `cached_read` which caches the contents and the file info for checking later:
pipeline_using_cached_read <- function(files, cache_dir) {
files |>
filecacher::cached_read(
label = "processed_data_using_cached_read",
read_fn = normal_pipeline,
cache = cache_dir,
type = "parquet"
)
}
# Alternate syntax, with `with_cache`. Using `with_cache` only checks that the cache file
# exists, without any information about the file list.
pipeline_using_with_cache <- function(files, cache_dir) {
normal_pipeline(files) |>
filecacher::with_cache(
label = "processed_data_using_with_cache",
cache = cache_dir,
type = "parquet"
)
}
# Time each pipeline when repeated 3 times:
time_pipeline <- function(pipeline_fn) {
function_name <- as.character(match.call()[2])
print(function_name)
# Create a separate directory for the cache for this function.
cache_dir <- tempfile(function_name, tmpdir = tf)
dir.create(cache_dir)
gc()
for (i in 1:3) {
print(system.time(pipeline_fn(iris_files_by_species, cache_dir)))
}
}
time_pipeline(normal_pipeline)
time_pipeline(pipeline_using_cached_read)
time_pipeline(pipeline_using_with_cache)
```