forked from hadley/r-pkgs
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtesting-basics.Rmd
688 lines (501 loc) · 34.1 KB
/
testing-basics.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
# Testing basics {#sec-testing-basics}
```{r, echo = FALSE}
source("common.R")
```
## Introduction
Testing is a vital part of package development.
It ensures that your code does what you want it to do.
Testing, however, adds an additional step to your development workflow.
The goal of this chapter is to show you how to make this task easier and more effective by doing formal automated testing using the testthat package.
The first stage of your testing journey is to become convinced that testing has enough benefits to justify the work.
For some of us, this is easy to accept.
Others must learn the hard way.
Once you've decided to embrace automated testing, it's time to learn some mechanics and figure out where testing fits into your development workflow.
As you and your R packages evolve, you'll start to encounter testing situations where it's fruitful to use techniques that are somewhat specific to testing and differ from what we do below `R/`.
## Why is formal testing worth the trouble?
Up until now, your workflow probably looks like this:
1. Write a function.
2. Load it with `devtools::load_all()`, maybe via Ctrl/Cmd + Shift + L.
3. Experiment with it in the console to see if it works.
4. Rinse and repeat.
While you *are* testing your code in this workflow, you're only doing it informally.
The problem with this approach is that when you come back to this code in 3 months time to add a new feature, you've probably forgotten some of the informal tests you ran the first time around.
This makes it very easy to break code that used to work.
Many of us embrace automated testing when we realize we're re-fixing a bug for the second or fifth time.
While writing code or fixing bugs, we might perform some interactive tests to make sure the code we're working on does what we want.
But it's easy to forget all the different use cases you need to check, if you don't have a system for storing and re-running the tests.
This is a common practice among R programmers.
The problem is not that you don't test your code, it's that you don't automate your tests.
In this chapter you'll learn how to transition from informal *ad hoc* testing, done interactively in the console, to automated testing (also known as unit testing).
While turning casual interactive tests into formal tests requires a little more work up front, it pays off in four ways:
- Fewer bugs.
Because you're explicit about how your code should behave, you will have fewer bugs.
The reason why is a bit like the reason double entry book-keeping works: because you describe the behaviour of your code in two places, both in your code and in your tests, you are able to check one against the other.
With informal testing, it's tempting to just explore typical and authentic usage, similar to writing examples.
However, when writing formal tests, it's natural to adopt a more adversarial mindset and to anticipate how unexpected inputs could break your code.
If you always introduce new tests when you add a new feature or function, you'll prevent many bugs from being created in the first place, because you will proactively address pesky edge cases.
Tests also keep you from (re-)breaking one feature, when you're tinkering with another.
- Better code structure.
Code that is well designed tends to be easy to test and you can turn this to your advantage.
If you are struggling to write tests, consider if the problem is actually the design of your function(s).
The process of writing tests is a great way to get free, private, and personalized feedback on how well-factored your code is.
If you integrate testing into your development workflow (versus planning to slap tests on "later"), you'll subject yourself to constant pressure to break complicated operations into separate functions that work in isolation.
Functions that are easier to test are usually easier to understand and re-combine in new ways.
- Call to action.
When we start to fix a bug, we first like to convert it into a (failing) test.
This is wonderfully effective at making your goal very concrete: make this test pass.
This is basically a special case of a general methodology known as test driven development.
- Robust code.
If you know that all the major functionality of your package is well covered by the tests, you can confidently make big changes without worrying about accidentally breaking something.
This provides a great reality check when you think you've discovered some brilliant new way to simplify your package.
Sometimes such "simplifications" fail to account for some important use case and your tests will save you from yourself.
## Introducing testthat
This chapter describes how to test your R package using the testthat package: <https://testthat.r-lib.org>
If you're familiar with frameworks for unit testing in other languages, you should note that there are some fundamental differences with testthat.
This is because R is, at heart, more a functional programming language than an object-oriented programming language.
For instance, because R's main object-oriented systems (S3 and S4) are based on generic functions (i.e., methods belong to functions not classes), testing approaches built around objects and methods don't make much sense.
testthat 3.0.0 (released 2020-10-31) introduced the idea of an **edition** of testthat, specifically the third edition of testthat, which we refer to as testthat 3e.
An edition is a bundle of behaviors that you have to explicitly choose to use, allowing us to make otherwise backward incompatible changes.
This is particularly important for testthat since it has a very large number of packages that use it (almost 5,000 at last count).
To use testthat 3e, you must have a version of testthat \>= 3.0.0 **and** explicitly opt-in to the third edition behaviors.
This allows testthat to continue to evolve and improve without breaking historical packages that are in a rather passive maintenance phase.
You can learn more in the [testthat 3e article](https://testthat.r-lib.org/articles/third-edition.html) and the blog post [Upgrading to testthat edition 3](https://www.tidyverse.org/blog/2022/02/upkeep-testthat-3/).
We recommend testthat 3e for all new packages and we recommend updating existing, actively maintained packages to use testthat 3e.
Unless we say otherwise, this chapter describes testthat 3e.
## Test mechanics and workflow {#sec-tests-mechanics-workflow}
### Initial setup
To setup your package to use testthat, run:
```{r, eval = FALSE}
usethis::use_testthat(3)
```
This will:
1. Create a `tests/testthat/` directory.
2. Add testthat to the `Suggests` field in the `DESCRIPTION` and specify testthat 3e in the `Config/testthat/edition` field.
The affected `DESCRIPTION` fields might look like:
Suggests: testthat (>= 3.0.0)
Config/testthat/edition: 3
3. Create a file `tests/testthat.R` that runs all your tests when `R CMD check` runs.
(You'll learn more about that in @sec-r-cmd-check.) For a package named "pkg", the contents of this file will be something like:
```{r eval = FALSE}
library(testthat)
library(pkg)
test_check("pkg")
```
This initial setup is usually something you do once per package.
However, even in a package that already uses testthat, it is safe to run `use_testthat(3)`, when you're ready to opt-in to testthat 3e.
Do not edit `tests/testthat.R`!
It is run during `R CMD check` (and, therefore, `devtools::check()`), but is not used in most other test-running scenarios (such as `devtools::test()` or `devtools::test_active_file()`).
If you want to do something that affects all of your tests, there is almost always a better way than modifying the boilerplate `tests/testthat.R` script.
This chapter details many different ways to make objects and logic available during testing.
### Create a test
As you define functions in your package, in the files below `R/`, you add the corresponding tests to `.R` files in `tests/testthat/`.
We strongly recommend that the organisation of test files match the organisation of `R/` files, discussed in @sec-code-organising: The `foofy()` function (and its friends and helpers) should be defined in `R/foofy.R` and their tests should live in `tests/testthat/test-foofy.R`.
R tests/testthat
└── foofy.R └── test-foofy.R
foofy <- function(...) {...} test_that("foofy does this", {...})
test_that("foofy does that", {...})
Even if you have different conventions for file organisation and naming, note that testthat tests **must** live in files below `tests/testthat/` and these file names **must** begin with `test`.
The test file name is displayed in testthat output, which provides helpful context[^testing-basics-1].
[^testing-basics-1]: The legacy function `testthat::context()` is superseded now and its use in new or actively maintained code is discouraged.
In testthat 3e, `context()` is formally deprecated; you should just remove it.
Once you adopt an intentional, synchronized approach to the organisation of files below `R/` and `tests/testthat/`, the necessary contextual information is right there in the file name, rendering the legacy `context()` superfluous.
<!-- Hadley thinks this is too much detail about use_r()/use_test(). I will likely agree when I revisit this later. Leaving it for now. -->
usethis offers a helpful pair of functions for creating or toggling between files:
- `usethis::use_r()`
- `usethis::use_test()`
Either one can be called with a file (base) name, in order to create a file *de novo* and open it for editing:
```{r, eval = FALSE}
use_r("foofy") # creates and opens R/foofy.R
use_test("blarg") # creates and opens tests/testthat/test-blarg.R
```
The `use_r()` / `use_test()` duo has some convenience features that make them "just work" in many common situations:
- When determining the target file, they can deal with the presence or absence of the `.R` extension and the `test-` prefix.
- Equivalent: `use_r("foofy.R")`, `use_r("foofy")`
- Equivalent: `use_test("test-blarg.R")`, `use_test("blarg.R")`, `use_test("blarg")`
- If the target file already exists, it is opened for editing. Otherwise, the target is created and then opened for editing.
::: callout-tip
## RStudio
If `R/foofy.R` is the active file in your source editor, you can even call `use_test()` with no arguments!
The target test file can be inferred: if you're editing `R/foofy.R`, you probably want to work on the companion test file, `tests/testthat/test-foofy.R`.
If it doesn't exist yet, it is created and, either way, the test file is opened for editing.
This all works the other way around also.
If you're editing `tests/testthat/test-foofy.R`, a call to `use_r()` (optionally, creates and) opens `R/foofy.R`.
:::
Bottom line: `use_r()` / `use_test()` are handy for initially creating these file pairs and, later, for shifting your attention from one to the other.
When `use_test()` creates a new test file, it inserts an example test:
```{r, eval = FALSE}
test_that("multiplication works", {
expect_equal(2 * 2, 4)
})
```
You will replace this with your own description and logic, but it's a nice reminder of the basic form:
- A test file holds one or more `test_that()` tests.
- Each test describes what it's testing: e.g. "multiplication works".
- Each test has one or more expectations: e.g. `expect_equal(2 * 2, 4)`.
Below we go into much more detail about how to test your own functions.
### Run tests
Depending on where you are in the development cycle, you'll run your tests at various scales.
When you are rapidly iterating on a function, you might work at the level of individual tests.
As the code settles down, you'll run entire test files and eventually the entire test suite.
**Micro-iteration**: This is the interactive phase where you initiate and refine a function and its tests in tandem.
Here you will run `devtools::load_all()` often, and then execute individual expectations or whole tests interactively in the console.
Note that `load_all()` attaches testthat, so it puts you in the perfect position to test drive your functions and to execute individual tests and expectations.
```{r, eval = FALSE}
# tweak the foofy() function and re-load it
devtools::load_all()
# interactively explore and refine expectations and tests
expect_equal(foofy(...), EXPECTED_FOOFY_OUTPUT)
testthat("foofy does good things", {...})
```
**Mezzo-iteration**: As one file's-worth of functions and their associated tests start to shape up, you will want to execute the entire file of associated tests, perhaps with `testthat::test_file()`:
```{=html}
<!-- `devtools::test_file()` exists, but is deprecated, because of the collision.
Consider marking as defunct / removing before the book is published. -->
```
```{r, eval = FALSE}
testthat::test_file("tests/testthat/test-foofy.R")
```
::: callout-tip
## RStudio
In RStudio, you have a couple shortcuts for running a single test file.
If the target test file is the active file, you can use the "Run Tests" button in the upper right corner of the source editor.
There is also a useful function, `devtools::test_active_file()`.
It infers the target test file from the active file and, similar to how `use_r()` and `use_test()` work, it works regardless of whether the active file is a test file or a companion `R/*.R` file.
You can invoke this via "Run a test file" in the Addins menu.
However, for heavy users (like us!), we recommend [binding this to a keyboard shortcut](https://support.rstudio.com/hc/en-us/articles/206382178-Customizing-Keyboard-Shortcuts-in-the-RStudio-IDE); we use Ctrl/Cmd + T.
:::
**Macro-iteration**: As you near the completion of a new feature or bug fix, you will want to run the entire test suite.
Most frequently, you'll do this with `devtools::test()`:
```{r, eval = FALSE}
devtools::test()
```
Then eventually, as part of `R CMD check` with `devtools::check()`:
```{r, eval = FALSE}
devtools::check()
```
::: callout-tip
## RStudio
`devtools::test()` is mapped to Ctrl/Cmd + Shift + T.
`devtools::check()` is mapped to Ctrl/Cmd + Shift + E.
:::
```{=html}
<!-- We'll probably want to replace this example eventually, but it's a decent placeholder.
The test failure is something highly artificial I created very quickly.
It would be better to use an example that actually makes sense, if someone elects to really read and think about it.-->
```
The output of `devtools::test()` looks like this:
devtools::test()
ℹ Loading usethis
ℹ Testing usethis
✓ | F W S OK | Context
✓ | 1 | addin [0.1s]
✓ | 6 | badge [0.5s]
...
✓ | 27 | github-actions [4.9s]
...
✓ | 44 | write [0.6s]
══ Results ═════════════════════════════════════════════════════════════════
Duration: 31.3 s
── Skipped tests ──────────────────────────────────────────────────────────
• Not on GitHub Actions, Travis, or Appveyor (3)
[ FAIL 1 | WARN 0 | SKIP 3 | PASS 728 ]
Test failure is reported like this:
Failure (test-release.R:108:3): get_release_data() works if no file found
res$Version (`actual`) not equal to "0.0.0.9000" (`expected`).
`actual`: "0.0.0.1234"
`expected`: "0.0.0.9000"
Each failure gives a description of the test (e.g., "get_release_data() works if no file found"), its location (e.g., "test-release.R:108:3"), and the reason for the failure (e.g., "res\$Version (`actual`) not equal to"0.0.0.9000" (`expected`)").
The idea is that you'll modify your code (either the functions defined below `R/` or the tests in `tests/testthat/`) until all tests are passing.
## Test organisation
A test file lives in `tests/testthat/`.
Its name must start with `test`.
We will inspect and execute a test file from the stringr package.
<!-- https://github.com/hadley/r-pkgs/issues/778 -->
But first, for the purposes of rendering this book, we must attach stringr and testthat.
Note that in real-life test-running situations, this is taken care of by your package development tooling:
- During interactive development, `devtools::load_all()` makes testthat and the package-under-development available (both its exported and unexported functions).
- During arms-length test execution, this is taken care of by `devtools::test_active_file()`, `devtools::test()`, and `tests/testthat.R`.
::: callout-important
Your test files should not include these `library()` calls.
We also explicitly request testthat edition 3, but in a real package this will be declared in DESCRIPTION.
```{r}
library(testthat)
library(stringr)
local_edition(3)
```
:::
<!-- TODO: check if stringr has released and, if so, remove this footnote and edit DESCRIPTION. -->
Here are the contents of `tests/testthat/test-dup.r` from stringr[^testing-basics-2]:
[^testing-basics-2]: Note that we are building the book against a dev version of stringr.
```{r}
test_that("basic duplication works", {
expect_equal(str_dup("a", 3), "aaa")
expect_equal(str_dup("abc", 2), "abcabc")
expect_equal(str_dup(c("a", "b"), 2), c("aa", "bb"))
expect_equal(str_dup(c("a", "b"), c(2, 3)), c("aa", "bbb"))
})
test_that("0 duplicates equals empty string", {
expect_equal(str_dup("a", 0), "")
expect_equal(str_dup(c("a", "b"), 0), rep("", 2))
})
test_that("uses tidyverse recycling rules", {
expect_error(str_dup(1:2, 1:3), class = "vctrs_error_incompatible_size")
})
```
This file shows a typical mix of tests:
- "basic duplication works" tests typical usage of `str_dup()`.
- "0 duplicates equals empty string" probes a specific edge case.
- "uses tidyverse recycling rules" checks that malformed input results in a specific kind of error.
Tests are organised hierarchically: **expectations** are grouped into **tests** which are organised in **files**:
- A **file** holds multiple related tests.
In this example, the file `tests/testthat/test-dup.r` has all of the tests for the code in `R/dup.r`.
- A **test** groups together multiple expectations to test the output from a simple function, a range of possibilities for a single parameter from a more complicated function, or tightly related functionality from across multiple functions.
This is why they are sometimes called **unit** tests.
Each test should cover a single unit of functionality.
A test is created with `test_that(desc, code)`.
It's common to write the description (`desc`) to create something that reads naturally, e.g. `test_that("basic duplication works", { ... })`.
A test failure report includes this description, which is why you want a concise statement of the test's purpose, e.g. a specific behaviour.
- An **expectation** is the atom of testing.
It describes the expected result of a computation: Does it have the right value and right class?
Does it produce an error when it should?
An expectation automates visual checking of results in the console.
Expectations are functions that start with `expect_`.
You want to arrange things such that, when a test fails, you'll know what's wrong and where in your code to look for the problem.
This motivates all our recommendations regarding file organisation, file naming, and the test description.
Finally, try to avoid putting too many expectations in one test - it's better to have more smaller tests than fewer larger tests.
## Expectations
An expectation is the finest level of testing.
It makes a binary assertion about whether or not an object has the properties you expect.
This object is usually the return value from a function in your package.
All expectations have a similar structure:
- They start with `expect_`.
- They have two main arguments: the first is the actual result, the second is what you expect.
- If the actual and expected results don't agree, testthat throws an error.
- Some expectations have additional arguments that control the finer points of comparing an actual and expected result.
While you'll normally put expectations inside tests inside files, you can also run them directly.
This makes it easy to explore expectations interactively.
There are more than 40 expectations in the testthat package, which can be explored in testthat's [reference index](https://testthat.r-lib.org/reference/index.html).
We're only going to cover the most important expectations here.
### Testing for equality
`expect_equal()` checks for equality, with some reasonable amount of numeric tolerance:
```{r, error = TRUE}
expect_equal(10, 10)
expect_equal(10, 10L)
expect_equal(10, 10 + 1e-7)
expect_equal(10, 11)
```
If you want to test for exact equivalence, use `expect_identical()`.
```{r, error = TRUE}
expect_equal(10, 10 + 1e-7)
expect_identical(10, 10 + 1e-7)
expect_equal(2, 2L)
expect_identical(2, 2L)
```
### Testing errors
Use `expect_error()` to check whether an expression throws an error.
It's the most important expectation in a trio that also includes `expect_warning()` and `expect_message()`.
We're going to emphasize errors here, but most of this also applies to warnings and messages.
Usually you care about two things when testing an error:
- Does the code fail? Specifically, does it fail for the right reason?
- Does the accompanying message make sense to the human who needs to deal with the error?
The entry-level solution is to expect a specific type of condition:
```{r, warning = TRUE, error = TRUE}
1 / "a"
expect_error(1 / "a")
log(-1)
expect_warning(log(-1))
```
This is a bit dangerous, though, especially when testing an error.
There are lots of ways for code to fail!
Consider the following test:
```{r}
expect_error(str_duq(1:2, 1:3))
```
This expectation is intended to test the recycling behaviour of `str_dup()`.
But, due to a typo, it tests behaviour of a non-existent function, `str_duq()`.
The code throws an error and, therefore, the test above passes, but for the *wrong reason*.
Due to the typo, the actual error thrown is about not being able to find the `str_duq()` function:
```{r, error = TRUE}
str_duq(1:2, 1:3)
```
Historically, the best defense against this was to assert that the condition message matches a certain regular expression, via the second argument, `regexp`.
```{r}
expect_error(1 / "a", "non-numeric argument")
expect_warning(log(-1), "NaNs produced")
```
This does, in fact, force our typo problem to the surface:
```{r error = TRUE}
expect_error(str_duq(1:2, 1:3), "recycle")
```
Recent developments in both base R and rlang make it increasingly likely that conditions are signaled with a *class*, which provides a better basis for creating precise expectations.
That is exactly what you've already seen in this stringr example.
This is what the `class` argument is for:
```{r, error = TRUE}
# fails, error has wrong class
expect_error(str_duq(1:2, 1:3), class = "vctrs_error_incompatible_size")
# passes, error has expected class
expect_error(str_dup(1:2, 1:3), class = "vctrs_error_incompatible_size")
```
```{=html}
<!-- This advice feels somewhat at odds with Hadley's ambivalence about classed errors.
I.e. I think he recommends using a classed condition only when there's a specific reason to.
Then again, maybe the desire to test it is a legitimate reason? -->
```
If you have the choice, express your expectation in terms of the condition's class, instead of its message.
Often this is under your control, i.e. if your package signals the condition.
If the condition originates from base R or another package, proceed with caution.
This is often a good reminder to re-consider the wisdom of testing a condition that is not fully under your control in the first place.
To check for the *absence* of an error, warning, or message, pass `NA` to the `regexp` argument:
```{r}
expect_error(1 / 2, NA)
```
Of course, this is functionally equivalent to simply executing `1 / 2` inside a test, but some developers find the explicit expectation expressive.
If you genuinely care about the condition's message, testthat 3e's snapshot tests are the best approach, which we describe next.
### Snapshot tests
Sometimes it's difficult or awkward to describe an expected result with code.
Snapshot tests are a great solution to this problem and this is one of the main innovations in testthat 3e.
The basic idea is that you record the expected result in a separate, human-readable file.
Going forward, testthat alerts you when a newly computed result differs from the previously recorded snapshot.
Snapshot tests are particularly suited to monitoring your package's user interface, such as its informational messages and errors.
Other use cases include testing images or other complicated objects.
We'll illustrate snapshot tests using the waldo package.
Under the hood, testthat 3e uses waldo to do the heavy lifting of "actual vs. expected" comparisons, so it's good for you to know a bit about waldo anyway.
One of waldo's main design goals is to present differences in a clear and actionable manner, as opposed to a frustrating declaration that "this differs from that and I know exactly how, but I won't tell you".
Therefore, the formatting of output from `waldo::compare()` is very intentional and is well-suited to a snapshot test.
The binary outcome of `TRUE` (actual == expected) vs. `FALSE` (actual != expected) is fairly easy to check and could get its own test.
Here we're concerned with writing a test to ensure that differences are reported to the user in the intended way.
waldo uses a few different layouts for showing diffs, depending on various conditions.
Here we deliberately constrain the width, in order to trigger a side-by-side layout.[^testing-basics-3]
(We'll talk more about the withr package below.)
[^testing-basics-3]: The actual waldo test that inspires this example targets an unexported helper function that produces the desired layout.
But this example uses an exported waldo function for simplicity.
```{r}
withr::with_options(
list(width = 20),
waldo::compare(c("X", letters), c(letters, "X"))
)
```
The two primary inputs differ at two locations: once at the start and once at the end.
This layout presents both of these, with some surrounding context, which helps the reader orient themselves.
Here's how this would look as a snapshot test:
```{=html}
<!-- Actually using snapshot test technology here is hard.
I can sort of see how it might be done, by looking at the source of testthat's vignette about snapshotting.
For the moment, I'm just faking it. -->
```
```{r eval = FALSE}
test_that("side-by-side diffs work", {
withr::local_options(width = 20)
expect_snapshot(
waldo::compare(c("X", letters), c(letters, "X"))
)
})
```
If you execute `expect_snapshot()` or a test containing `expect_snapshot()` interactively, you'll see this:
Can't compare snapshot to reference when testing interactively
ℹ Run `devtools::test()` or `testthat::test_file()` to see changes
followed by a preview of the snapshot output.
This reminds you that snapshot tests only function when executed non-interactively, i.e. while running an entire test file or the entire test suite.
This applies both to recording snapshots and to checking them.
The first time this test is executed via `devtools::test()` or similar, you'll see something like this (assume the test is in `tests/testthat/test-diff.R`):
── Warning (test-diff.R:63:3): side-by-side diffs work ─────────────────────
Adding new snapshot:
Code
waldo::compare(c(
"X", letters), c(
letters, "X"))
Output
old | new
[1] "X" -
[2] "a" | "a" [1]
[3] "b" | "b" [2]
[4] "c" | "c" [3]
old | new
[25] "x" | "x" [24]
[26] "y" | "y" [25]
[27] "z" | "z" [26]
- "X" [27]
There is always a warning upon initial snapshot creation.
The snapshot is added to `tests/testthat/_snaps/diff.md`, under the heading "side-by-side diffs work", which comes from the test's description.
The snapshot looks exactly like what a user sees interactively in the console, which is the experience we want to check for.
The snapshot file is *also* very readable, which is pleasant for the package developer.
This readability extends to snapshot changes, i.e. when examining Git diffs and reviewing pull requests on GitHub, which helps you keep tabs on your user interface.
Going forward, as long as your package continues to re-capitulate the expected snapshot, this test will pass.
If you've written a lot of conventional unit tests, you can appreciate how well-suited snapshot tests are for this use case.
If we were forced to inline the expected output in the test file, there would be a great deal of quoting, escaping, and newline management.
Ironically, with conventional expectations, the output you expect your user to see tends to get obscured by a heavy layer of syntactical noise.
What about when a snapshot test fails?
Let's imagine a hypothetical internal change where the default labels switch from "old" and "new" to "OLD" and "NEW".
Here's how this snapshot test would react:
── Failure (test-diff.R:63:3): side-by-side diffs work──────────────────────────
Snapshot of code has changed:
old[3:15] vs new[3:15]
" \"X\", letters), c("
" letters, \"X\"))"
"Output"
- " old | new "
+ " OLD | NEW "
" [1] \"X\" - "
" [2] \"a\" | \"a\" [1]"
" [3] \"b\" | \"b\" [2]"
" [4] \"c\" | \"c\" [3]"
" "
- " old | new "
+ " OLD | NEW "
and 3 more ...
* Run `snapshot_accept('diff')` to accept the change
* Run `snapshot_review('diff')` to interactively review the change
This diff is presented more effectively in most real-world usage, e.g. in the console, by a Git client, or via a Shiny app (see below).
But even this plain text version highlights the changes quite clearly.
Each of the two loci of change is indicated with a pair of lines marked with `-` and `+`, showing how the snapshot has changed.
You can call `testthat::snapshot_review('diff')` to review changes locally in a Shiny app, which lets you skip or accept individual snapshots.
Or, if all changes are intentional and expected, you can go straight to `testthat::snapshot_accept('diff')`.
Once you've re-synchronized your actual output and the snapshots on file, your tests will pass once again.
In real life, snapshot tests are a great way to stay informed about changes to your package's user interface, due to your own internal changes or due to changes in your dependencies or even R itself.
`expect_snapshot()` has a few arguments worth knowing about:
- `cran = FALSE`: By default, snapshot tests are skipped if it looks like the tests are running on CRAN's servers.
This reflects the typical intent of snapshot tests, which is to proactively monitor user interface, but not to check for correctness, which presumably is the job of other unit tests which are not skipped.
In typical usage, a snapshot change is something the developer will want to know about, but it does not signal an actual defect.
- `error = FALSE`: By default, snapshot code is *not* allowed to throw an error.
See `expect_error()`, described above, for one approach to testing errors.
But sometimes you want to assess "Does this error message make sense to a human?" and having it laid out in context in a snapshot is a great way to see it with fresh eyes.
Specify `error = TRUE` in this case:
```{r eval = FALSE}
expect_snapshot(error = TRUE,
str_dup(1:2, 1:3)
)
```
- `transform`: Sometimes a snapshot contains volatile, insignificant elements, such as a temporary filepath or a timestamp.
The `transform` argument accepts a function, presumably written by you, to remove or replace such changeable text.
Another use of `transform` is to scrub sensitive information from the snapshot.
- `variant`: Sometimes snapshots reflect the ambient conditions, such as the operating system or the version of R or one of your dependencies, and you need a different snapshot for each variant.
This is an experimental and somewhat advanced feature, so if you can arrange things to use a single snapshot, you probably should.
In typical usage, testthat will take care of managing the snapshot files below `tests/testthat/_snaps/`.
This happens in the normal course of you running your tests and, perhaps, calling `testthat::snapshot_accept()`.
### Shortcuts for other common patterns
We conclude this section with a few more expectations that come up frequently.
But remember that testthat has [many more pre-built expectations](https://testthat.r-lib.org/reference/index.html) than we can demonstrate here.
Several expectations can be described as "shortcuts", i.e. they streamline a pattern that comes up often enough to deserve its own wrapper.
- `expect_match(object, regexp, ...)` is a shortcut that wraps `grepl(pattern = regexp, x = object, ...)`.
It matches a character vector input against a regular expression `regexp`.
The optional `all` argument controls whether all elements or just one element needs to match.
Read the `expect_match()` documentation to see how additional arguments, like `ignore.case = FALSE` or `fixed = TRUE`, can be passed down to `grepl()`.
```{r, error = TRUE}
string <- "Testing is fun!"
expect_match(string, "Testing")
# Fails, match is case-sensitive
expect_match(string, "testing")
# Passes because additional arguments are passed to grepl():
expect_match(string, "testing", ignore.case = TRUE)
```
- `expect_length(object, n)` is a shortcut for `expect_equal(length(object), n)`.
- `expect_setequal(x, y)` tests that every element of `x` occurs in `y`, and that every element of `y` occurs in `x`.
But it won't fail if `x` and `y` happen to have their elements in a different order.
- `expect_s3_class()` and `expect_s4_class()` check that an object `inherit()`s from a specified class.
`expect_type()`checks the `typeof()` an object.
```{r, error = TRUE}
model <- lm(mpg ~ wt, data = mtcars)
expect_s3_class(model, "lm")
expect_s3_class(model, "glm")
```
`expect_true()` and `expect_false()` are useful catchalls if none of the other expectations does what you need.