Merge branch 'main' into rework-manuscript

epiforecasts · Feb 27, 2024 · 19e124b · 19e124b
2 parents e58392f + 249447f
commit 19e124b
Show file tree

Hide file tree

Showing 12 changed files with 579 additions and 388 deletions.
diff --git a/NEWS.md b/NEWS.md
@@ -6,7 +6,7 @@ The update introduces breaking changes. If you want to keep using the older vers
 
 ## Package updates
 - In `score()`, required columns "true_value" and "prediction" were renamed and replaced by required columns "observed" and "predicted". Scoring functions now also use the function arguments "observed" and "predicted" everywhere consistently. 
-- The overall scoring workflow was updated. `score()` is now a generic function that dispatches the correct method based on the forecast type. forecast types currently supported are "binary", "point", "sample" and "quantile" with corresponding classes "forecast_binary", "forecast_point", "forecast_sample" and "forecast_quantile". An object of class `forecast_*` can be created using the function `as_forecast()`, which also replaces the previous function `check_forecasts()` (see more information below). 
+- The overall scoring workflow was updated. `score()` is now a generic function that dispatches the correct method based on the forecast type. forecast types currently supported are "binary", "point", "sample" and "quantile" with corresponding classes "forecast_binary", "forecast_point", "forecast_sample" and "forecast_quantile". An object of class `forecast_*` can be created using the function `as_forecast()`, which also replaces the previous function `check_forecasts()` (see more information below). The function also allows users to rename required columns and specify the forecast unit in a single step, taking over the functionality of `set_forecast_unit()` in most cases.
 - Scoring rules (functions used for scoring) received a consistent interface and input checks:
   - Scoring rules for binary forecasts:
     - `observed`: factor with exactly 2 levels

diff --git a/R/validate.R b/R/validate.R
@@ -2,13 +2,23 @@
 #' @description Convert a data.frame or similar of forecasts into an object of
 #' class `forecast_*` and validate it.
 #'
-#' `as_forecast()` determines the forecast type (binary, point, sample-based or
+#' `as_forecast()`
+#' - allows users to specify the current names of the columns that correspond
+#' to the columns required by `scoringutils` (`observed`, `predicted`,
+#' `model`, as well `quantile_level` for quantile-based forecasts and
+#' `sample_id` for sample-based forecasts). `as_forecast()` renames the
+#' existing columns.
+#' - allows users to specify the unit of a single forecast. It removes all
+#' columns that are neither part of the forecast unit nor a required column
+#' (see [set_forecast_unit()] for details)
+#' - Determines the forecast type (binary, point, sample-based or
 #' quantile-based) from the input data (using the function
-#' [get_forecast_type()]. It then constructs an object of the
-#' appropriate class (`forecast_binary`, `forecast_point`, `forecast_sample`, or
+#' [get_forecast_type()].
+#' - Constructs a forecast object of the appropriate class
+#' (`forecast_binary`, `forecast_point`, `forecast_sample`, or
 #' `forecast_quantile`, using the function [new_forecast()]).
-#' Lastly, it calls [as_forecast()] on the object to make sure it conforms with
-#' the required input formats.
+#' - Calls [validate_forecast()] on the newly created forecast object to
+#' validate it
 #' @inheritParams score
 #' @inheritSection forecast_types Forecast types and input format
 #' @return Depending on the forecast type, an object of class
@@ -18,19 +28,104 @@
 #' @keywords check-forecasts
 #' @examples
 #' as_forecast(example_binary)
-#' as_forecast(example_quantile)
-as_forecast <- function(data, ...) {
+#' as_forecast(
+#'   example_quantile,
+#'   forecast_unit = c("model", "target_type", "target_end_date",
+#'                     "horizon", "location")
+#' )
+as_forecast <- function(data,
+                        ...) {
   UseMethod("as_forecast")
 }
 
 #' @rdname as_forecast
+#' @param forecast_unit (optional) Name of the columns in `data` (after
+#' any renaming of columns done by `as_forecast()`) that denote the unit of a
+#' single forecast. See [get_forecast_unit()] for details.
+#' If `NULL` (the default), all columns that are not required columns are
+#' assumed to form the unit of a single forecast. If specified, all columns
+#' that are not part of the forecast unit (or required columns) will be removed.
+#' @param forecast_type (optional) The forecast type you expect the forecasts
+#' to have. If the forecast type as determined by `scoringutils` based on the
+#' input does not match this, an error will be thrown. If `NULL` (the default),
+#' the forecast type will be inferred from the data.
+#' @param observed (optional) Name of the column in `data` that contains the
+#' observed values. This column will be renamed to "observed".
+#' @param predicted (optional) Name of the column in `data` that contains the
+#' predicted values. This column will be renamed to "predicted".
+#' @param model (optional) Name of the column in `data` that contains the names
+#' of the models/forecasters that generated the predicted values.
+#' This column will be renamed to "model".
+#' @param quantile_level (optional) Name of the column in `data` that contains
+#' the quantile level of the predicted values. This column will be renamed to
+#' "quantile_level". Only applicable to quantile-based forecasts.
+#' @param sample_id (optional) Name of the column in `data` that contains the
+#' sample id. This column will be renamed to "sample_id". Only applicable to
+#' sample-based forecasts.
 #' @export
-as_forecast.default <- function(data, ...) {
+as_forecast.default <- function(data,
+                                forecast_unit = NULL,
+                                forecast_type = NULL,
+                                observed = NULL,
+                                predicted = NULL,
+                                model = NULL,
+                                quantile_level = NULL,
+                                sample_id = NULL,
+                                ...) {
+  # check inputs
+  data <- ensure_data.table(data)
+  assert_character(observed, len = 1, null.ok = TRUE)
+  assert_subset(observed, names(data), empty.ok = TRUE)
+
+  assert_character(predicted, len = 1, null.ok = TRUE)
+  assert_subset(predicted, names(data), empty.ok = TRUE)
+
+  assert_character(model, len = 1, null.ok = TRUE)
+  assert_subset(model, names(data), empty.ok = TRUE)
+
+  assert_character(quantile_level, len = 1, null.ok = TRUE)
+  assert_subset(quantile_level, names(data), empty.ok = TRUE)
+
+  assert_character(sample_id, len = 1, null.ok = TRUE)
+  assert_subset(sample_id, names(data), empty.ok = TRUE)
+
+  # rename columns
+  if (!is.null(observed)) {
+    setnames(data, old = observed, new = "observed")
+  }
+  if (!is.null(predicted)) {
+    setnames(data, old = predicted, new = "predicted")
+  }
+  if (!is.null(model)) {
+    setnames(data, old = model, new = "model")
+  }
+  if (!is.null(quantile_level)) {
+    setnames(data, old = quantile_level, new = "quantile_level")
+  }
+  if (!is.null(sample_id)) {
+    setnames(data, old = sample_id, new = "sample_id")
+  }
+
+  # assert that the correct column names are present after renaming
   assert(check_data_columns(data))
 
+  # set forecast unit (error handling is done in `set_forecast_unit()`)
+  if (!is.null(forecast_unit)) {
+    data <- set_forecast_unit(data, forecast_unit)
+  }
+
   # find forecast type
+  desired <- forecast_type
   forecast_type <- get_forecast_type(data)
 
+  if (!is.null(desired) && desired != forecast_type) {
+    stop(
+      "Forecast type determined by scoringutils based on input: `",
+      forecast_type,
+      "`. Desired forecast type: `", desired, "`."
+    )
+  }
+
   # construct class
   data <- new_forecast(data, paste0("forecast_", forecast_type))
 

diff --git a/README.Rmd b/README.Rmd
@@ -120,12 +120,14 @@ example_quantile %>%
 
 ### Scoring forecasts
 
-Forecasts can be easily and quickly scored using the `score()` function. `score()` automatically tries to determine the `forecast_unit`, i.e. the set of columns that uniquely defines a single forecast, by taking all column names of the data into account. However, it is recommended to set the forecast unit manually using `set_forecast_unit()` as this may help to avoid errors, especially when scoringutils is used in automated pipelines. The function `set_forecast_unit()` will simply drop unneeded columns. To verify everything is in order, the function `validate_forecast()` should be used. The result of that check can then passed directly into `score()`. `score()` returns unsummarised scores, which in most cases is not what the user wants. Here we make use of additional functions from `scoringutils` to add empirical coverage-levels (`add_coverage()`), and scores relative to a baseline model (here chosen to be the EuroCOVIDhub-ensemble model). See the getting started vignette for more details. Finally we summarise these scores by model and target type.
+Forecasts can be easily and quickly scored using the `score()` function. `score()` automatically tries to determine the `forecast_unit`, i.e. the set of columns that uniquely defines a single forecast, by taking all column names of the data into account. However, it is recommended to set the forecast unit manually by specifying the "forecast_unit" argument in `as_forecast()` as this may help to avoid errors. This will drop all columns that are neither part of the forecast unit nor part of the columns internally used by `scoringutils`. The function `as_forecast()` processes and validates the inputs. 
+`score()` returns unsummarised scores, which in most cases is not what the user wants. Here we make use of additional functions from `scoringutils` to add empirical coverage-levels (`add_coverage()`), and scores relative to a baseline model (here chosen to be the EuroCOVIDhub-ensemble model). See the getting started vignette for more details. Finally we summarise these scores by model and target type.
 
 ```{r score-example}
 example_quantile %>%
-  set_forecast_unit(c("location", "target_end_date", "target_type", "horizon", "model")) %>%
-  as_forecast() %>%
+  as_forecast(forecast_unit = c(
+    "location", "target_end_date", "target_type", "horizon", "model"
+  )) %>%
   add_coverage() %>%
   score() %>%
   add_pairwise_comparison(

diff --git a/README.md b/README.md
@@ -134,23 +134,23 @@ Forecasts can be easily and quickly scored using the `score()` function.
 `score()` automatically tries to determine the `forecast_unit`, i.e. the
 set of columns that uniquely defines a single forecast, by taking all
 column names of the data into account. However, it is recommended to set
-the forecast unit manually using `set_forecast_unit()` as this may help
-to avoid errors, especially when scoringutils is used in automated
-pipelines. The function `set_forecast_unit()` will simply drop unneeded
-columns. To verify everything is in order, the function
-`validate_forecast()` should be used. The result of that check can then
-passed directly into `score()`. `score()` returns unsummarised scores,
-which in most cases is not what the user wants. Here we make use of
-additional functions from `scoringutils` to add empirical
+the forecast unit manually by specifying the “forecast_unit” argument in
+`as_forecast()` as this may help to avoid errors. This will drop all
+columns that are neither part of the forecast unit nor part of the
+columns internally used by `scoringutils`. The function `as_forecast()`
+processes and validates the inputs. `score()` returns unsummarised
+scores, which in most cases is not what the user wants. Here we make use
+of additional functions from `scoringutils` to add empirical
 coverage-levels (`add_coverage()`), and scores relative to a baseline
 model (here chosen to be the EuroCOVIDhub-ensemble model). See the
 getting started vignette for more details. Finally we summarise these
 scores by model and target type.
 
 ``` r
 example_quantile %>%
-  set_forecast_unit(c("location", "target_end_date", "target_type", "horizon", "model")) %>%
-  as_forecast() %>%
+  as_forecast(forecast_unit = c(
+    "location", "target_end_date", "target_type", "horizon", "model"
+  )) %>%
   add_coverage() %>%
   score() %>%
   add_pairwise_comparison(

diff --git a/man/as_forecast.Rd b/man/as_forecast.Rd