tests.rmd

---
title: Testing
layout: default
output: bookdown::html_chapter
---

# Testing {#tests}

To make sure that your package behaves as you'd expect, testing is vital. While you probably test your code already, you may not have taken the next step: automation. This chapter describes how to use the `testthat` package to create automated tests of your code.


## Motivation

I started automating my tests because I discovered I was spending too much time recreating bugs that I had previously fixed. While writing code or fixing bugs, I'd perform interactive tests to make sure the code worked. But I never had a system which could store those test so I could re-run them as needed. I think that this is a common practice among R programmers. It's not that we don't test our code, it's that we don't have a way to make it easy to re-run tests, let alone to do so automatically.

Turning your casual interactive tests into reproducible scripts requires a little more work up front. Since you can no longer visually inspect the output, you have to write code that does the inspection for you. However, this is an investment in the future of your code. It pays off in four ways:

* Decreased frustration. Whenever I'm on a strict deadline I always seem to 
  discover a bug in old code. Having to stop what I'm doing to fix it is a real 
  pain. The more I test, the less this happens. Also, based on how well they 
  test, I can easily see which parts of my code I can be confident about.

* Better code structure. Code that's easy to test is usually better designed. I 
  have found writing tests makes me break up complicated parts of my code into 
  separate functions that can work in isolation. These functions have less 
  duplication, and are easier to test, understand and re-combine.

* Easier to pick up where you left off. If you always finish a coding session by 
  creating a failing test (e.g. for the feature you want to implement next), 
  testing makes it easier to pick up where you left off: your tests let you know 
  what to do next.

* Increased confidence when making changes. If you know that all major
  functionality has an associated test, you can confidently make big
  changes without worrying about accidentally breaking something. For me,
  this is particularly useful when I think of a simpler way to accomplish a 
  task: often my simpler solution is only simpler because I've forgotten an 
  important use case!

## testthat test structure

`testthat` has a hierarchical structure made up of expectations, tests and contexts. 

* An __expectation__ describes the expected result of a computation: Does it
  have the right value and right class? Does it produce error messages when it
  should? There are 11 types of built in expectations.

* A __test__ groups together multiple expectations to test one function, or
  tightly related functionality across multiple functions. A test is created
  with the `test_that()` function.

* A __context__ groups together multiple tests that test related
  functionality.  Contexts are defined with the `context()` function.

These are described in detail below. 

Expectations give you the tools to convert your visual, interactive experiments into reproducible scripts. Tests and contexts are ways of organising your expectations so that when something goes wrong you can easily track down the source of the problem.

### Expectations

An expectation is the finest level of testing. It makes a binary assertion about whether or not a value is as you expect. If the expectation isn't true, `testthat` will raise an error.

An expectation is easy to read. Its syntax is close to a sentence in English: `expect_that(a, equals(b))` reads as "I expect that a will equal b".

There are 11 built in expectations:

* `equals()` uses `all.equal()` to check for equality within some numerical 
  tolerance.

      # Passes 
      expect_that(10, equals(10)) 
      # Also passes
      expect_that(10, equals(10 + 1e-7))  
      # Fails
      expect_that(10, equals(10 + 1e-6))  
      # Definitely fails! 
      expect_that(10, equals(11)) 
  
* `is_identical_to()` uses `identical()` to check for exact equality.

      # Passes
      expect_that(10, is_identical_to(10)) 
      # Fails
      expect_that(10, is_identical_to(10 + 1e-10))  

* `is_equivalent_to()` is a more relaxed version of `equals()` that ignores
  attributes:

      # Fails
      expect_that(c("one" = 1, "two" = 2), equals(1:2))
      # Passes
      expect_that(c("one" = 1, "two" = 2), is_equivalent_to(1:2))

* `is_a()` checks that an object `inherit()`s from a specified class.

      model <- lm(mpg ~ wt, data = mtcars)
      # Passes
      expect_that(model, is_a("lm"))  
      # Fails 
      expect_that(model, is_a("glm")) 

* `matches()` matches a character vector against a regular expression. The 
  optional `all` argument controls whether all elements or just one element 
  needs to match. This code is powered by `str_detect()` from the `stringr` 
  package.

      string <- "Testing is fun!"
      # Passes
      expect_that(string, matches("Testing")) 
      # Fails, match is case-sensitive
      expect_that(string, matches("testing")) 
      # Passes, match can be a regular expression
      expect_that(string, matches("t.+ting")) 

* `prints_text()` matches the printed output from an expression against a
  regular expression.

      a <- list(1:10, letters)
      # Passes
      expect_that(str(a), prints_text("List of 2"))
      # Passes
      expect_that(str(a), prints_text(fixed("int [1:10]"))

* `shows_message()` checks that an expression shows a message:

      # Passes
      expect_that(library(mgcv), shows_message("This is mgcv"))

* `gives_warning()` expects that you get a warning.

      # Passes
      expect_that(log(-1), gives_warning())
      expect_that(log(-1), 
        gives_warning("NaNs produced"))
      # Fails
      expect_that(log(0), gives_warning())

* `throws_error()` verifies that the expression throws an error. You can also
  supply a regular expression which is applied to the text of the error.

      # Fails
      expect_that(1 / 2, throws_error()) 
      # Passes
      expect_that(1 / "a", throws_error()) 
      # But better to be explicit
      expect_that(1 / "a", throws_error("non-numeric argument"))

* `is_true()` is a useful catchall if none of the other expectations do what
  you want - it checks that an expression is true. `is_false()` is the
  complement of `is_true()`.

If you don't like the readable, but verbose, `expect_that` style, you can use one of the shortcut functions:

<table>
  <tr>
    <th>Full</th><th>Abbreviation</th>
  </tr>
  <tr><td><code>expect_that(x, is_true())</code></td><td><code>expect_true(x)</code></td></tr>
  <tr><td><code>expect_that(x, is_false())</code></td><td><code>expect_false(x)</code></td></tr>
  <tr><td><code>expect_that(x, is_a(y))</code></td><td><code>expect_is(x, y)</code></td></tr>
  <tr><td><code>expect_that(x, equals(y))</code></td><td><code>expect_equal(x, y)</code></td></tr>
  <tr><td><code>expect_that(x, is_equivalent_to(y))</code></td><td><code>expect_equivalent(x, y)</code></td></tr>
  <tr><td><code>expect_that(x, is_identical_to(y))</code></td><td><code>expect_identical(x, y)</code></td></tr>
  <tr><td><code>expect_that(x, matches(y))</code></td><td><code>expect_match(x, y)</code></td></tr>
  <tr><td><code>expect_that(x, prints_text(y))</code></td><td><code>expect_output(x, y)</code></td></tr>
  <tr><td><code>expect_that(x, shows_message(y))</code></td><td><code>expect_message(x, y)</code></td></tr>
  <tr><td><code>expect_that(x, gives_warning(y))</code></td><td><code>expect_warning(x, y)</code></td></tr>
  <tr><td><code>expect_that(x, throws_error(y))</code></td><td><code>expect_error(x, y)</code></td></tr>
</table>

Running a sequence of expectations is useful because it ensures that your code behaves as expected. You could even use an expectation within a function to check that the inputs are what you expect. However, they're not so useful when something goes wrong: all you know is that something is not as expected. You don't know anything about where the problem is. Tests, described next, organise expectations into coherent blocks that describe the overall goal of a set of expectations.

### Tests

Each test should test a single item of functionality and have an informative name. The idea is that when a test fails, you should know exactly where to look for the problem in your code. You create a new test with `test_that()`, with parameters name and code block. The test name should complete the sentence "Test that" and the code block should be a collection of expectations. When there's a failure, it's the test name that will help you figure out what's gone wrong.

The following code shows one test of the `floor_date()` function from `library(lubridate)`. There are 7 expectations that check the results of rounding a date down to the nearest second, minute, hour, etc. Note how we've defined a couple of helper functions to make the test more concise so you can easily see what changes in each expectation.

    test_that("floor_date works for different units", {
      base <- as.POSIXct("2009-08-03 12:01:59.23", tz = "UTC")

      is_time <- function(x) equals(as.POSIXct(x, tz = "UTC"))
      floor_base <- function(unit) floor_date(base, unit)

      expect_that(floor_base("second"), is_time("2009-08-03 12:01:59"))
      expect_that(floor_base("minute"), is_time("2009-08-03 12:01:00"))
      expect_that(floor_base("hour"),   is_time("2009-08-03 12:00:00"))
      expect_that(floor_base("day"),    is_time("2009-08-03 00:00:00"))
      expect_that(floor_base("week"),   is_time("2009-08-02 00:00:00"))
      expect_that(floor_base("month"),  is_time("2009-08-01 00:00:00"))
      expect_that(floor_base("year"),   is_time("2009-01-01 00:00:00"))
    })

Each test is run in its own environment so it is self-contained. The exceptions are actions which have effects outside the local environment. These include things that affect:

* the filesystem: creating and deleting files, changing the working directory,
  etc.

* the search path: package loading \& detaching, {\tt attach}.

* global options, like `options()` and `par()`.

When you use these actions in tests, you'll need to clean up after yourself. Many other testing packages have set-up and teardown methods that are run automatically before and after each test. These are not so important with `testthat` because you can create objects outside of the tests and rely on R's copy-on-modify semantics to keep them unchanged between test runs. To clean up other actions you can use regular R functions.

You can run a set of tests just by `source()`ing a file, but as you write more and more tests, you'll probably want a little more infrastructure. The first part of that infrastructure is contexts, described below, which give a convenient way to label each file, helping to locate failures when you have many tests.

## Contexts

Contexts group tests together into blocks that test related functionality, and are established with the code `context("My context")`. Normally there is one context per file, but you can have more if you want, or you can use the same context in multiple files.

The following code shows the context that tests the operation of the `str_length()` function in `stringr`. The tests are very simple. They cover two situations where `nchar()` from base R gives surprising results.

    context("String length")

    test_that("str_length is number of characters", {
      expect_that(str_length("a"), equals(1))
      expect_that(str_length("ab"), equals(2))
      expect_that(str_length("abc"), equals(3))
    })

    test_that("str_length of missing is missing", {
      expect_that(str_length(NA), equals(NA_integer_))
      expect_that(str_length(c(NA, 1)), equals(c(NA, 1)))
      expect_that(str_length("NA"), equals(2))
    })

    test_that("str_length of factor is length of level", {
      expect_that(str_length(factor("a")), equals(1))
      expect_that(str_length(factor("ab")), equals(2))
      expect_that(str_length(factor("abc")), equals(3))
    })

## Running tests

There are two situations where you want to run your tests: interactively while you're developing your package to make sure that everything works ok, and then as a final automated check before releasing your package. 

* run all tests in a file or directory `test_file()` or `test_dir()`

* automatically run tests whenever something changes with `auto_test`.

* have `R CMD check` run your tests.

### Testing files and directories

You can run all tests in a file with `test_file(path)`. The following code shows the difference between `test_file` and `source` for the `stringr` tests, as well as those same tests for `nchar`. You can see the advantage of `test_file` over `source`: instead of seeing the first failure, you see the performance of all tests.

    > source("test-str_length.r")
    > test_file("test-str_length.r")
    .........

    > source("test-nchar.r")
    Error: Test failure in 'nchar of missing is missing'
    * nchar(NA) not equal to NA_integer_
    'is.NA' value mismatch: 0 in current 1 in target
    * nchar(c(NA, 1)) not equal to c(NA, 1)
    'is.NA' value mismatch: 0 in current 1 in target

    > test_file("test-nchar.r")
    ...12..34

    1. Failure: nchar of missing is missing ---------------------------------
    nchar(NA) not equal to NA_integer_
    'is.NA' value mismatch: 0 in current 1 in target

    2. Failure: nchar of missing is missing ---------------------------------
    nchar(c(NA, 1)) not equal to c(NA, 1)
    'is.NA' value mismatch: 0 in current 1 in target

    3. Failure: nchar of factor is length of level --------------------------
    nchar(factor("ab")) not equal to 2
    Mean relative difference: 0.5

    4. Failure: nchar of factor is length of level --------------------------
    nchar(factor("abc")) not equal to 3
    Mean relative difference: 0.6666667

Each expectation is displayed as either a green dot (indicating success) or a red number (including failure). That number indexes a list of further details, which is printed after all tests have been run. What you can't see is that this display is dynamic: a new dot is printed each time a test passes (it's rather satisfying to watch).

`test_dir` will run all of the test files in a directory, assuming that test files start with `test` (this means it's possible to intermix regular code and tests in the same directory). This is handy if you're developing a small set of scripts rather than a complete package. The following shows the output from the `stringr` tests. You can see there are 12 contexts with between 2 and 25 expectations each. As you'd hope for in a released package, all the tests pass.

    > test_dir("inst/tests/")
    String and pattern checks : ......
    Detecting patterns : .........
    Duplicating strings : ......
    Extract patterns : ..
    Joining strings : ......
    String length : .........
    Locations : ............
    Matching groups : ..............
    Test padding : ....
    Splitting strings : .........................
    Extracting substrings : ...................
    Trimming strings : ........

If you want a more minimal report, suitable for display on a dashboard, you can use a different reporter. `testthat` comes with three reporters: stop, minimal and summary. The stop reporter is the default and `stop()`s whenever a failure is encountered. The summary reporter is the default for `test_file` and `test_dir`. The minimal reporter is shown below: it prints `.` for success, `E` for error and `F` for failure. The following output shows the results of running the `stringr` test suite with the minimal reporter.

    > test_dir("inst/tests/", reporter="minimal")
    ...............................................

### Autotest

Tests are most useful when run frequently, and `auto_test` takes that idea to the limit by rerunning your tests whenever your code or tests changes. `auto_test()` has two arguments, `code_path` and `test_path`, which point to a directory of source code and tests respectively.

Once run, `auto_test()` will continuously scan both directories for changes. If a test file is modified, it will test that file. If a code file is modified, it will reload that file and rerun all tests. To quit, you'll need to press Ctrl + Break in Windows, Esc in Mac OS, or Ctrl + C if running from the command line.

This promotes a workflow where the _only_ way you test your code is through tests. Instead of modify-save-source-check you just modify and save, then watch the automated test output for problems.

## R CMD check

If you're using `testthat` in a package, you need to adopt a specific structure to work with `R CMD check`.

First, you need to add `Suggests: testthat` to `DESCRIPTION` as stated in the [Writing R Extensions](http://cran.r-project.org/doc/manuals/R-exts.html#Package-Dependencies) document. This avoids a R CMD check warning about unspecified dependencies.

After that, you need to put the tests somewhere `R CMD check` will find and run them.

Previously, best practice was to put all test files in `inst/tests` and ensure that R CMD check ran them by putting the following code in `tests/test-all.R`:

    library(testthat)
    library(yourpackage)
    test_package("yourpackage")
    
Now, recommended practice is to put your tests in `tests/testthat`, and ensure R CMD check runs them by putting the following code in `tests/test-all.R`:

    library(testthat)
    test_check("yourpackage")
    
The advantage of this new structure is that the user has control over whether or not tests are installed using the --install-tests parameter to R CMD install, or INSTALL_opts = c("--install-tests") argument to install.packages(). I'm not sure why you wouldn't want to install the tests, but now you have the flexibility as requested by CRAN maintainers.

## Development cycles

It's useful to distinguish between exploratory programming and confirmatory programming (in the same sense as exploratory and confirmatory data analysis), because the development cycle differs in several important ways.

### Confirmatory programming

Confirmatory programming happens when you know what you need to do and what the results of your changes will be (new feature X appears or known bug Y disappears); you just need to figure out the way to do it. Confirmatory programming is also known as [test driven development][tdd] (TDD), a development style that grew out of [extreme programming](extreme-programming). The basic idea is that, before you implement any new feature or fix a known bug, you should:

1. Write an automated test and run `test()` to make sure the test fails (so you know
   you've captured the bug correctly).

2. Modify code to fix the bug or implement the new feature.

3. Run `test(pkg)` to reload the package and re-run the tests.

4. Repeat 2--3 until all tests pass.

5. Update documentation comments, run `document()`, and update `NEWS`.

For this paradigm, you might also want to use `testthat::auto_test()`, which will watch your tests and code and will automatically rerun your tests when either changes. This allows you to skip step three: you just modify your code and watch to see if the tests pass or fail.

### Exploratory programming

Exploratory programming is the complement of confirmatory programming, when you have some idea of what you want to achieve, but you're not sure about the details. You're not sure what the functions should look like, what arguments they should have and what they should return. You may not even be sure how you are going to break down the problem into pieces. In exploratory programming, you're exploring the solution space by writing functions and you need the freedom to rewrite large chunks of the code as you understand the problem domain better.

The exploratory programming cycle is similar to confirmatory, but it's not usually worth writing the tests before writing the code, because the interface will change so much:

1. Edit code and reload with `load_all()`.

2. Test interactively.

3. Repeat 1--2 until code works.

4. Write automated tests and `test()`.

5. Update documentation comments, run `document()`, and update `NEWS`

The automated tests are still vitally important because they are what will prevent your code from failing silently in the future.

[tdd]:http://en.wikipedia.org/wiki/Test-driven_development
[extreme-programming]:http://en.wikipedia.org/wiki/Extreme_programming