scoringutils: Utilities for Scoring and Assessing Predictions

The scoringutils package provides a collection of metrics and proper scoring rules and aims to make it simple to score probabilistic forecasts against observed values.

You can find additional information and examples in the papers Evaluating Forecasts with scoringutils in R Scoring epidemiological forecasts on transformed scales as well as the Vignettes (Getting started, Details on the metrics implemented and Scoring forecasts directly).

The scoringutils package offers convenient automated forecast evaluation through the function score(). The function operates on data.frames (it uses data.table internally for speed and efficiency) and can easily be integrated in a workflow based on dplyr or data.table. It also provides experienced users with a set of reliable lower-level scoring metrics operating on vectors/matrices they can build upon in other applications. In addition it implements a wide range of flexible plots designed to cover many use cases.

Where available scoringutils depends on functionality from scoringRules which provides a comprehensive collection of proper scoring rules for predictive probability distributions represented as sample or parametric distributions. For some forecast types, such as quantile forecasts, scoringutils also implements additional metrics for evaluating forecasts. On top of providing an interface to the proper scoring rules implemented in scoringRules and natively, scoringutils also offers utilities for summarising and visualising forecasts and scores, and to obtain relative scores between models which may be useful for non-overlapping forecasts and forecasts across scales.

Predictions can be handled in various formats: scoringutils can handle probabilistic forecasts in either a sample based or a quantile based format. For more detail on the expected input formats please see below. True values can be integer, continuous or binary, and appropriate scores for each of these value types are selected automatically.

Installation

Install the CRAN version of this package using:

install.packages("scoringutils")

Install the stable development version of the package with:

install.packages("scoringutils", repos = "https://epiforecasts.r-universe.dev")

Install the unstable development from GitHub using the following,

remotes::install_github("epiforecasts/scoringutils", dependencies = TRUE)

Quick start

In this quick start guide we explore some of the functionality of the scoringutils package using quantile forecasts from the ECDC forecasting hub as an example. For more detailed documentation please see the package vignettes, and individual function documentation.

Plotting forecasts

As a first step to evaluating the forecasts we visualise them. For the purposes of this example here we make use of plot_predictions() to filter the available forecasts for a single model, and forecast date.

example_quantile %>%
  make_NA(what = "truth", 
          target_end_date >= "2021-07-15", 
          target_end_date < "2021-05-22"
  ) %>%
  make_NA(what = "forecast",
          model != "EuroCOVIDhub-ensemble", 
          forecast_date != "2021-06-28"
  ) %>%
  plot_predictions(
    x = "target_end_date",
    by = c("target_type", "location")
  ) +
  facet_wrap(target_type ~ location, ncol = 4, scales = "free")

Scoring forecasts

Forecasts can be easily and quickly scored using the score() function. score() automatically tries to determine the forecast_unit, i.e. the set of columns that uniquely defines a single forecast, by taking all column names of the data into account. However, it is recommended to set the forecast unit manually using set_forecast_unit() as this may help to avoid errors, especially when scoringutils is used in automated pipelines. The function set_forecast_unit() will simply drop unneeded columns. To verify everything is in order, the function validate() should be used. The result of that check can then passed directly into score(). score() returns unsummarised scores, which in most cases is not what the user wants. Here we make use of additional functions from scoringutils to add empirical coverage-levels (add_coverage()), and scores relative to a baseline model (here chosen to be the EuroCOVIDhub-ensemble model). See the getting started vignette for more details. Finally we summarise these scores by model and target type.

example_quantile %>%
  set_forecast_unit(c("location", "target_end_date", "target_type", "horizon", "model")) %>%
  validate() %>%
  add_coverage() %>%
  score() %>%
  summarise_scores(
    by = c("model", "target_type")
  ) %>%
  add_pairwise_comparison(
    baseline = "EuroCOVIDhub-ensemble"
  ) %>%
  summarise_scores(
    fun = signif, 
    digits = 2
  ) %>%
  kable()

model	target_type	wis	overprediction	underprediction	dispersion	bias	coverage_50	coverage_90	coverage_deviation	ae_median	relative_skill	scaled_rel_skill
EuroCOVIDhub-baseline	Cases	28000	14000.0	10000.0	4100	0.0980	0.33	0.82	-0.120	38000	1.30	1.6
EuroCOVIDhub-baseline	Deaths	160	66.0	2.1	91	0.3400	0.66	1.00	0.120	230	2.30	3.8
EuroCOVIDhub-ensemble	Cases	18000	10000.0	4200.0	3700	-0.0560	0.39	0.80	-0.100	24000	0.82	1.0
EuroCOVIDhub-ensemble	Deaths	41	7.1	4.1	30	0.0730	0.88	1.00	0.200	53	0.60	1.0
UMass-MechBayes	Deaths	53	9.0	17.0	27	-0.0220	0.46	0.88	-0.025	78	0.75	1.3
epiforecasts-EpiNow2	Cases	21000	12000.0	3300.0	5700	-0.0790	0.47	0.79	-0.070	28000	0.95	1.2
epiforecasts-EpiNow2	Deaths	67	19.0	16.0	32	-0.0051	0.42	0.91	-0.045	100	0.98	1.6

scoringutils contains additional functionality to transform forecasts, to summarise scores at different levels, to visualise them, and to explore the forecasts themselves. See the package vignettes and function documentation for more information.

You may want to score forecasts based on transformations of the original data in order to obtain a more complete evaluation (see this paper for more information). This can be done using the function transform_forecasts(). In the following example, we truncate values at 0 and use the function log_shift() to add 1 to all values before applying the natural logarithm.

example_quantile %>%
 .[, observed := ifelse(observed < 0, 0, observed)] %>%
  transform_forecasts(append = TRUE, fun = log_shift, offset = 1) %>%
  score %>%
  summarise_scores(by = c("model", "target_type", "scale")) %>%
  head()
#>                    model target_type   scale         wis overprediction
#> 1: EuroCOVIDhub-ensemble       Cases natural 11550.70664    3650.004755
#> 2: EuroCOVIDhub-baseline       Cases natural 22090.45747    7702.983696
#> 3:  epiforecasts-EpiNow2       Cases natural 14438.43943    5513.705842
#> 4: EuroCOVIDhub-ensemble      Deaths natural    41.42249       7.138247
#> 5: EuroCOVIDhub-baseline      Deaths natural   159.40387      65.899117
#> 6:       UMass-MechBayes      Deaths natural    52.65195       8.978601
#>    underprediction dispersion        bias coverage_50 coverage_90
#> 1:     4237.177310 3663.52458 -0.05640625   0.3906250   0.8046875
#> 2:    10284.972826 4102.50094  0.09726562   0.3281250   0.8203125
#> 3:     3260.355639 5664.37795 -0.07890625   0.4687500   0.7890625
#> 4:        4.103261   30.18099  0.07265625   0.8750000   1.0000000
#> 5:        2.098505   91.40625  0.33906250   0.6640625   1.0000000
#> 6:       16.800951   26.87239 -0.02234375   0.4609375   0.8750000
#>    coverage_deviation   ae_median
#> 1:        -0.10230114 17707.95312
#> 2:        -0.11437500 32080.48438
#> 3:        -0.06963068 21530.69531
#> 4:         0.20380682    53.13281
#> 5:         0.12142045   233.25781
#> 6:        -0.02488636    78.47656

Citation

If using scoringutils in your work please consider citing it using the output of citation("scoringutils"):

#> To cite scoringutils in publications use the following. If you use the
#> CRPS, DSS, or Log Score, please also cite scoringRules.
#> 
#>   Nikos I. Bosse, Hugo Gruson, Sebastian Funk, Anne Cori, Edwin van
#>   Leeuwen, and Sam Abbott (2022). Evaluating Forecasts with
#>   scoringutils in R, arXiv. DOI: 10.48550/ARXIV.2205.07090
#> 
#> To cite scoringRules in publications use:
#> 
#>   Alexander Jordan, Fabian Krueger, Sebastian Lerch (2019). Evaluating
#>   Probabilistic Forecasts with scoringRules. Journal of Statistical
#>   Software, 90(12), 1-37. DOI 10.18637/jss.v090.i12
#> 
#> To see these entries in BibTeX format, use 'print(<citation>,
#> bibtex=TRUE)', 'toBibtex(.)', or set
#> 'options(citation.bibtex.max=999)'.

How to make a bug report or feature request

Please briefly describe your problem and what output you expect in an issue. If you have a question, please don’t open an issue. Instead, ask on our Q and A page.

Contributing

We welcome contributions and new contributors! We particularly appreciate help on priority problems in the issues. Please check and add to the issues, and/or add a pull request.

Code of Conduct

Please note that the scoringutils project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

scoringutils: Utilities for Scoring and Assessing Predictions

Installation

Quick start

Plotting forecasts

Scoring forecasts

Citation

How to make a bug report or feature request

Contributing

Code of Conduct

Files

README.md

Latest commit

History

README.md

File metadata and controls

scoringutils: Utilities for Scoring and Assessing Predictions

Installation

Quick start

Plotting forecasts

Scoring forecasts

Citation

How to make a bug report or feature request

Contributing

Code of Conduct