Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #851 - Create S3 methods for converting to point and quantile forecasts #876

Merged
merged 9 commits into from
Jul 30, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# Generated by roxygen2: do not edit by hand

S3method(`[`,scores)
S3method(as_forecast_point,default)
S3method(as_forecast_point,forecast_quantile)
S3method(as_forecast_quantile,default)
S3method(as_forecast_quantile,forecast_sample)
S3method(assert_forecast,default)
S3method(assert_forecast,forecast_binary)
S3method(assert_forecast,forecast_point)
Expand Down Expand Up @@ -66,7 +70,6 @@ export(plot_pit)
export(plot_quantile_coverage)
export(plot_wis)
export(quantile_score)
export(sample_to_quantile)
export(score)
export(se_mean_sample)
export(select_metrics)
Expand Down
5 changes: 3 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ of our [original](https://doi.org/10.48550/arXiv.2205.07090) `scoringutils` pape
- the CRPS is now reported as decomposition into dispersion, overprediction and underprediction.

### Creating a forecast object
- The `as_forecast_<type>()` functions create a forecast object and validates it. They also allow users to rename/specify required columns and specify the forecast unit in a single step, taking over the functionality of `set_forecast_unit()` in most cases.
- The `as_forecast_<type>()` functions create a forecast object and validates it. They also allow users to rename/specify required columns and specify the forecast unit in a single step, taking over the functionality of `set_forecast_unit()` in most cases. See `?as_forecast()` for more information.
- Some `as_forecast_<type>()` functions like e.g. `as_forecast_point()` and `as_forecast_quantile()` have S3 methods for converting from another forecast type to the respective forecast type. For example, `as_forecast_quantile()` has a method for converting from a `forecast_sample` object to a `forecast_quantile` object by estimating quantiles from the samples.

### Updated workflows
- An example workflow for scoring a forecast now looks like this:
Expand Down Expand Up @@ -281,7 +282,7 @@ the mean before returning an output.
### Bug fixes

- Testing was expanded
- Minor bugs were fixed, for example a bug in the `sample_to_quantile()` function
- Minor bugs were fixed, for example a bug in the `as_forecast_quantile()` function
(https://github.com/epiforecasts/scoringutils/pull/223)

### Package data updated
Expand Down
164 changes: 139 additions & 25 deletions R/forecast.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,24 @@
#' and observations. If the input passes all input checks, it will be converted
#' to a `forecast` object. A forecast object is a `data.table` with a
#' class `forecast` and an additional class that depends on the forecast type.
#' Every forecast type has its own `as_forecast_()` function.
#' Every forecast type has its own `as_forecast_<type>()` function.
#' See the details section below for more information
#' on the expected input formats.
#'
#' The `as_forecast_()` functions give users some control over how their data
#' is parsed.
#' The `as_forecast_<type>()` functions give users some control over how their
#' data is parsed.
#' Using the arguments `observed`, `predicted`, `model`, etc. users can rename
#' existing columns of their input data to match the required columns for a
#' forecast object. Using the argument `forecast_unit`, users can specify the
#' the columns that uniquely identify a single forecast (and remove the others,
#' see [set_forecast_unit()] for details).
#'
#' The following functions are available:
#' - [as_forecast_point()]
#' - [as_forecast_binary()]
#' - [as_forecast_sample()]
#' - [as_forecast_quantile()]
#'
#' @param data A data.frame (or similar) with predicted and observed values.
#' See the details section of [as_forecast()] for additional information
#' on required input formats.
Expand All @@ -43,6 +49,8 @@
#' - `forecast_sample` for sample-based forecasts
#' - `forecast_quantile` for quantile-based forecasts
#' @keywords check-forecasts
#' @seealso [as_forecast_point()], [as_forecast_binary()],
#' [as_forecast_sample()], [as_forecast_quantile()]
#' @examples
#' as_forecast_binary(example_binary)
#' as_forecast_quantile(
Expand Down Expand Up @@ -99,48 +107,105 @@ as_forecast_generic <- function(data,
}


#' @rdname as_forecast
#' @title Create a `forecast` object for binary forecasts
#' @description
#' Create a `forecast` object for binary forecasts. See more information on
#' forecast types and expected input formats by calling `?`[as_forecast()].
#' @export
#' @inheritParams as_forecast
#' @seealso [as_forecast()], [as_forecast_point()], [as_forecast_binary()],
#' [as_forecast_sample()], [as_forecast_quantile()]
#' @importFrom cli cli_warn
as_forecast_point <- function(data,
forecast_unit = NULL,
observed = NULL,
predicted = NULL,
model = NULL) {
#' @keywords check-forecasts
as_forecast_binary <- function(data,
forecast_unit = NULL,
observed = NULL,
predicted = NULL,
model = NULL) {
data <- as_forecast_generic(data, forecast_unit, observed, predicted, model)
data <- new_forecast(data, "forecast_point")
data <- new_forecast(data, "forecast_binary")
assert_forecast(data)
return(data)
}


#' @rdname as_forecast
#' @title Create a `forecast` object for point forecasts
#' @description
#' Create a `forecast` object for point forecasts. See more information on
#' forecast types and expected input formats by calling `?`[as_forecast()].
#' @inherit as_forecast params
#' @param ... Unused
#' @seealso [as_forecast()], [as_forecast_point()], [as_forecast_binary()],
#' [as_forecast_sample()], [as_forecast_quantile()]
#' @export
#' @keywords check-forecasts
as_forecast_point <- function(data, ...) {
UseMethod("as_forecast_point")
}


#' @rdname as_forecast_point
#' @export
#' @importFrom cli cli_warn
as_forecast_binary <- function(data,
forecast_unit = NULL,
observed = NULL,
predicted = NULL,
model = NULL) {
as_forecast_point.default <- function(data,
seabbs marked this conversation as resolved.
Show resolved Hide resolved
forecast_unit = NULL,
observed = NULL,
predicted = NULL,
model = NULL,
...) {
data <- as_forecast_generic(data, forecast_unit, observed, predicted, model)
data <- new_forecast(data, "forecast_binary")
data <- new_forecast(data, "forecast_point")
assert_forecast(data)
return(data)
}


#' @rdname as_forecast
#' @rdname as_forecast_point
#' @description
#' When converting a `forecast_quantile` object into a `forecast_point` object,
#' the 0.5 quantile is extracted and returned as the point forecast.
#' @export
#' @keywords check-forecasts
as_forecast_point.forecast_quantile <- function(data, ...) {
seabbs marked this conversation as resolved.
Show resolved Hide resolved
assert_forecast(data, verbose = FALSE)
assert_subset(0.5, unique(data$quantile_level))
nikosbosse marked this conversation as resolved.
Show resolved Hide resolved

forecast <- data[quantile_level == 0.5]
forecast[, "quantile_level" := NULL]

point_forecast <- new_forecast(forecast, "forecast_point")
return(point_forecast)
}


#' @title Create a `forecast` object for quantile-based forecasts
#' @description
#' Create a `forecast` object for quantile-based forecasts. See more information
#' on forecast types and expected input formats by calling `?`[as_forecast()].
#' @param ... Unused
#' @seealso [as_forecast()], [as_forecast_point()], [as_forecast_binary()],
#' [as_forecast_sample()], [as_forecast_quantile()]
#' @inheritParams as_forecast
#' @export
#' @keywords check-forecasts
as_forecast_quantile <- function(data, ...) {
UseMethod("as_forecast_quantile")
}


#' @rdname as_forecast_quantile
#' @param quantile_level (optional) Name of the column in `data` that contains
#' the quantile level of the predicted values. This column will be renamed to
#' "quantile_level". Only applicable to quantile-based forecasts.
#' @export
#' @importFrom cli cli_warn
as_forecast_quantile <- function(data,
forecast_unit = NULL,
observed = NULL,
predicted = NULL,
model = NULL,
quantile_level = NULL) {
as_forecast_quantile.default <- function(data,
forecast_unit = NULL,
observed = NULL,
predicted = NULL,
model = NULL,
quantile_level = NULL,
...) {
assert_character(quantile_level, len = 1, null.ok = TRUE)
assert_subset(quantile_level, names(data), empty.ok = TRUE)
if (!is.null(quantile_level)) {
Expand All @@ -154,12 +219,61 @@ as_forecast_quantile <- function(data,
}


#' @rdname as_forecast
#' @rdname as_forecast_quantile
#' @description
#' When creating a `forecast_quantile` object from a `forecast_sample` object,
#' the quantiles are estimated by computing empircal quantiles from the samples
#' via [quantile()]. Note that empirical quantiles are a biased estimator for
#' the true quantiles in particular in the tails of the distribution and
#' when the number of available samples is low.
#' @param probs A numeric vector of quantile levels for which
seabbs marked this conversation as resolved.
Show resolved Hide resolved
#' quantiles will be computed. Corresponds to the `probs` argument in
#' [quantile()].
#' @param type Type argument passed down to the quantile function. For more
#' information, see [quantile()].
#' @importFrom stats quantile
#' @importFrom methods hasArg
#' @importFrom checkmate assert_numeric
#' @export
as_forecast_quantile.forecast_sample <- function(
data,
probs = c(0.05, 0.25, 0.5, 0.75, 0.95),
type = 7,
...
) {
forecast <- copy(data)
assert_forecast(forecast, verbose = FALSE)
assert_numeric(probs, min.len = 1)
reserved_columns <- c("predicted", "sample_id")
by <- setdiff(colnames(forecast), reserved_columns)

quantile_level <- unique(
round(c(probs, 1 - probs), digits = 10)
)

forecast <-
forecast[, .(quantile_level = quantile_level,
predicted = quantile(x = predicted, probs = ..probs,
type = ..type, na.rm = TRUE)),
by = by]

quantile_forecast <- new_forecast(forecast, "forecast_quantile")
assert_forecast(quantile_forecast)

return(quantile_forecast)
}


#' @title Create a `forecast` object for sample-based forecasts
#' @param sample_id (optional) Name of the column in `data` that contains the
#' sample id. This column will be renamed to "sample_id". Only applicable to
#' sample-based forecasts.
#' @inheritParams as_forecast
#' @export
#' @seealso [as_forecast()], [as_forecast_point()], [as_forecast_binary()],
#' [as_forecast_sample()], [as_forecast_quantile()]
#' @importFrom cli cli_warn
#' @keywords check-forecasts
as_forecast_sample <- function(data,
forecast_unit = NULL,
observed = NULL,
Expand Down
51 changes: 3 additions & 48 deletions R/utils_data_handling.R
Original file line number Diff line number Diff line change
@@ -1,48 +1,3 @@
#' @title Change data from a sample based format to a quantile format
#'
#' @description
#' Transform data from a format that is based on predictive samples to a format
#' based on plain quantiles.
#'
#' @param forecast A `forecast` object of class `forecast_sample` (a validated
#' data.table with predicted and observed values, see [as_forecast()]).
#'
#' @param quantile_level A numeric vector of quantile levels for which
#' quantiles will be computed.
#' @param type Type argument passed down to the quantile function. For more
#' information, see [quantile()].
#' @return a data.table in a long interval range format
#' @importFrom data.table as.data.table
#' @importFrom stats quantile
#' @importFrom methods hasArg
#' @importFrom checkmate assert_numeric
#' @keywords data-handling
#' @export
#' @examples
#' sample_to_quantile(as_forecast_sample(example_sample_discrete))
sample_to_quantile <- function(forecast,
quantile_level = c(0.05, 0.25, 0.5, 0.75, 0.95),
type = 7) {
forecast <- copy(forecast)
assert_forecast(forecast, forecast_type = "sample", verbose = FALSE)
assert_numeric(quantile_level, min.len = 1)
reserved_columns <- c("predicted", "sample_id")
by <- setdiff(colnames(forecast), reserved_columns)

quantile_level <- unique(
round(c(quantile_level, 1 - quantile_level), digits = 10)
)

forecast <-
forecast[, .(quantile_level = quantile_level,
predicted = quantile(x = predicted, probs = ..quantile_level,
type = ..type, na.rm = TRUE)),
by = by]

return(as_forecast_quantile(forecast))
}


# ==================== Functions internally used for scoring ===================
# These functions would ideally be replaced in the future

Expand Down Expand Up @@ -183,7 +138,7 @@ quantile_to_interval_numeric <- function(observed,
#' Transform data from a format that is based on predictive samples to a format
#' based on interval ranges.
#'
#' @inheritParams sample_to_quantile
#' @inheritParams as_forecast_quantile
#' @param keep_quantile_col keep quantile_level column, default is TRUE
#' @return A data.table in a long interval interval range format
#' @importFrom data.table as.data.table
Expand All @@ -200,9 +155,9 @@ sample_to_interval_long <- function(data,
upper_quantiles <- 1 - lower_quantiles
quantile_levels <- sort(unique(c(lower_quantiles, upper_quantiles)))

data <- sample_to_quantile(
data <- as_forecast_quantile(
data,
quantile_level = quantile_levels,
probs = quantile_levels,
type = type
)

Expand Down
2 changes: 1 addition & 1 deletion R/z_globalVariables.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
globalVariables(c(
"..index",
"..quantile_level",
"..probs",
"..type",
".",
".SD",
Expand Down
2 changes: 1 addition & 1 deletion inst/manuscript/R/00-standalone-Figure-replication.R
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ pit_plots <- plot_pit(stored$pit) +
# create interval and quantile coverage plots ----------------------------------
# create coverage plots by transforming to quantile format first
quantiles <- c(0.01, 0.025, seq(0.05, 0.95, 0.05), 0.975, 0.99)
df_quantile <- sample_to_quantile(df,
df_quantile <- as_forecast_quantile(df,
quantiles = quantiles)
seabbs marked this conversation as resolved.
Show resolved Hide resolved

res_quantile <- score(df_quantile)
Expand Down
2 changes: 1 addition & 1 deletion inst/manuscript/R/toy-example-calibration.R
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ pit_plots <- plot_pit(stored$pit) +
# create interval and quantile coverage plots ----------------------------------
# create coverage plots by transforming to quantile format first
quantiles <- c(0.01, 0.025, seq(0.05, 0.95, 0.05), 0.975, 0.99)
df_quantile <- sample_to_quantile(df,
df_quantile <- as_forecast_quantile(df,
quantile_level = quantiles)
seabbs marked this conversation as resolved.
Show resolved Hide resolved

res_quantile <- score(df_quantile)
Expand Down
Loading
Loading