Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename pairwise comparisons #722

Merged
merged 10 commits into from
Mar 22, 2024
6 changes: 3 additions & 3 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ S3method(validate_forecast,forecast_binary)
S3method(validate_forecast,forecast_point)
S3method(validate_forecast,forecast_quantile)
S3method(validate_forecast,forecast_sample)
export(add_pairwise_comparison)
export(add_relative_skill)
export(ae_median_quantile)
export(ae_median_sample)
export(as_forecast)
Expand All @@ -40,6 +40,7 @@ export(get_forecast_counts)
export(get_forecast_type)
export(get_forecast_unit)
export(get_metrics)
export(get_pairwise_comparisons)
export(get_pit)
export(interval_coverage)
export(interval_coverage_deviation)
Expand All @@ -55,13 +56,12 @@ export(metrics_quantile)
export(metrics_sample)
export(new_forecast)
export(overprediction)
export(pairwise_comparison)
export(pit_sample)
export(plot_correlations)
export(plot_forecast_counts)
export(plot_heatmap)
export(plot_interval_coverage)
export(plot_pairwise_comparison)
export(plot_pairwise_comparisons)
export(plot_pit)
export(plot_quantile_coverage)
export(plot_score_table)
Expand Down
3 changes: 2 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ The update introduces breaking changes. If you want to keep using the older vers
- `score()` now returns objects of class `scores` with a stored attribute `metrics` that holds the names of the scoring rules that were used. Users can call `get_metrics()` to access the names of those scoring rules.
- `check_forecasts()` was replaced by a different workflow. There now is a function, `as_forecast()`, that determines forecast type of the data, constructs a forecasting object and validates it using the function `validate_forecast()` (a generic that dispatches the correct method based on the forecast type). Objects of class `forecast_binary`, `forecast_point`, `forecast_sample` and `forecast_quantile` have print methods that fulfill the functionality of `check_forecasts()`.
- Users can test whether an object is of class `forecast_*()` using the function `is_forecast()`. Users can also test for a specific `forecast_*` class using the appropriate `is_forecast.forecast_*` method. For example, to check whether an object is of class `forecast_quantile`, you would use you would use `scoringutils:::is_forecast.forecast_quantile()`.
- The functionality for computing pairwise comparisons was now split from `summarise_scores()`. Instead of doing pairwise comparisons as part of summarising scores, a new function, `add_pairwise_comparison()`, was introduced that takes summarised scores as an input and adds columns with relative skil scores and scaled relative skill scores.
- The functionality for computing pairwise comparisons was now split from `summarise_scores()`. Instead of doing pairwise comparisons as part of summarising scores, a new function, `add_relative_skill()`, was introduced that takes summarised scores as an input and adds columns with relative skill scores and scaled relative skill scores.
- The function `pairwise_comparison()` was renamed to `get_pairwise_comparisons()`, in line with other `get_`-functions. Analogously, `plot_pairwise_comparison()` was renamed to `plot_pairwise_comparisons()`.
- `add_coverage()` was replaced by a new function, `get_coverage()`. This function comes with an updated workflow where coverage values are computed directly based on the original data and can then be visualised using `plot_interval_coverage()` or `plot_quantile_coverage()`. An example worfklow would be `example_quantile |> as_forecast() |> get_coverage(by = "model") |> plot_interval_coverage()`.
- Support for the interval format was mostly dropped (see PR #525 by @nikosbosse and reviewed by @seabbs)
- The function `bias_range()` was removed (users should now use `bias_quantile()` instead)
Expand Down
2 changes: 1 addition & 1 deletion R/correlations.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
#' be shown
#' @param digits A number indicating how many decimal places the result should
#' be rounded to. By default (`digits = NULL`) no rounding takes place.
#' @inheritParams pairwise_comparison
#' @inheritParams get_pairwise_comparisons
#' @return An object of class `scores` (a data.table with an additional
#' attribute `metrics` holding the names of the scores) with correlations
#' between different metrics
Expand Down
36 changes: 18 additions & 18 deletions R/pairwise-comparisons.R
Original file line number Diff line number Diff line change
Expand Up @@ -60,13 +60,13 @@
#' }
#'
#' scores <- score(as_forecast(example_quantile))
#' pairwise <- pairwise_comparison(scores, by = "target_type")
#' pairwise <- get_pairwise_comparisons(scores, by = "target_type")
#'
#' library(ggplot2)
#' plot_pairwise_comparison(pairwise, type = "mean_scores_ratio") +
#' plot_pairwise_comparisons(pairwise, type = "mean_scores_ratio") +
#' facet_wrap(~target_type)

pairwise_comparison <- function(
get_pairwise_comparisons <- function(
scores,
by = "model",
metric = intersect(c("wis", "crps", "brier_score"), names(scores)),
Expand Down Expand Up @@ -204,14 +204,14 @@ pairwise_comparison <- function(
#' @description
#'
#' This function does the pairwise comparison for one set of forecasts, but
#' multiple models involved. It gets called from [pairwise_comparison()].
#' [pairwise_comparison()] splits the data into arbitrary subgroups specified
#' by the user (e.g. if pairwise comparison should be done separately for
#' different forecast targets) and then the actual pairwise comparison for that
#' subgroup is managed from [pairwise_comparison_one_group()]. In order to
#' multiple models involved. It gets called from [get_pairwise_comparisons()].
#' [get_pairwise_comparisons()] splits the data into arbitrary subgroups
#' specified by the user (e.g. if pairwise comparison should be done separately
#' for different forecast targets) and then the actual pairwise comparison for
#' that subgroup is managed from [pairwise_comparison_one_group()]. In order to
#' actually do the comparison between two models over a subset of common
#' forecasts it calls [compare_two_models()].
#' @inherit pairwise_comparison params return
#' @inherit get_pairwise_comparisons params return
#' @importFrom cli cli_abort
#' @keywords internal

Expand Down Expand Up @@ -342,11 +342,11 @@ pairwise_comparison_one_group <- function(scores,
#' from [pairwise_comparison_one_group()], which handles the
#' comparison of multiple models on a single set of forecasts (there are no
#' subsets of forecasts to be distinguished). [pairwise_comparison_one_group()]
#' in turn gets called from from [pairwise_comparison()] which can handle
#' in turn gets called from from [get_pairwise_comparisons()] which can handle
#' pairwise comparisons for a set of forecasts with multiple subsets, e.g.
#' pairwise comparisons for one set of forecasts, but done separately for two
#' different forecast targets.
#' @inheritParams pairwise_comparison
#' @inheritParams get_pairwise_comparisons
#' @param name_model1 character, name of the first model
#' @param name_model2 character, name of the model to compare against
#' @param one_sided Boolean, default is `FALSE`, whether two conduct a one-sided
Expand Down Expand Up @@ -430,7 +430,7 @@ compare_two_models <- function(scores,
#' @title Calculate Geometric Mean
#'
#' @details
#' Used in [pairwise_comparison()].
#' Used in [get_pairwise_comparisons()].
#'
#' @param x numeric vector of values for which to calculate the geometric mean
#' @return the geometric mean of the values in `x`. `NA` values are ignored.
Expand All @@ -452,7 +452,7 @@ geometric_mean <- function(x) {
#' the two. This observed difference or ratio is compared against the same
#' test statistic based on permutations of the original data.
#'
#' Used in [pairwise_comparison()].
#' Used in [get_pairwise_comparisons()].
#'
#' @param scores1 vector of scores to compare against another vector of scores
#' @param scores2 A second vector of scores to compare against the first
Expand Down Expand Up @@ -509,22 +509,22 @@ permutation_test <- function(scores1,
#' @description Adds a columns with relative skills computed by running
#' pairwise comparisons on the scores.
#' For more information on
#' the computation of relative skill, see [pairwise_comparison()].
#' the computation of relative skill, see [get_pairwise_comparisons()].
#' Relative skill will be calculated for the aggregation level specified in
#' `by`.
#' @inheritParams pairwise_comparison
#' @inheritParams get_pairwise_comparisons
#' @export
#' @keywords keyword scoring
add_pairwise_comparison <- function(
add_relative_skill <- function(
scores,
by = "model",
metric = intersect(c("wis", "crps", "brier_score"), names(scores)),
baseline = NULL
) {

# input checks are done in `pairwise_comparison()`
# input checks are done in `get_pairwise_comparisons()`
# do pairwise comparisons ----------------------------------------------------
pairwise <- pairwise_comparison(
pairwise <- get_pairwise_comparisons(
scores = scores,
metric = metric,
baseline = baseline,
Expand Down
12 changes: 6 additions & 6 deletions R/plot.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
#' `NULL` (default), all metrics present in `scores` will be shown.
#'
#' @return A ggplot object with a coloured table of summarised scores
#' @inheritParams pairwise_comparison
#' @inheritParams get_pairwise_comparisons
#' @importFrom ggplot2 ggplot aes element_blank element_text labs coord_cartesian coord_flip
#' @importFrom data.table setDT melt
#' @importFrom stats sd
Expand Down Expand Up @@ -400,7 +400,7 @@ plot_quantile_coverage <- function(coverage,
#' between models
#'
#' @param comparison_result A data.frame as produced by
#' [pairwise_comparison()]
#' [get_pairwise_comparisons()]
#' @param type character vector of length one that is either
#' "mean_scores_ratio" or "pval". This denotes whether to
#' visualise the ratio or the p-value of the pairwise comparison.
Expand All @@ -417,12 +417,12 @@ plot_quantile_coverage <- function(coverage,
#' @examples
#' library(ggplot2)
#' scores <- score(as_forecast(example_quantile))
#' pairwise <- pairwise_comparison(scores, by = "target_type")
#' plot_pairwise_comparison(pairwise, type = "mean_scores_ratio") +
#' pairwise <- get_pairwise_comparisons(scores, by = "target_type")
#' plot_pairwise_comparisons(pairwise, type = "mean_scores_ratio") +
#' facet_wrap(~target_type)

plot_pairwise_comparison <- function(comparison_result,
type = c("mean_scores_ratio", "pval")) {
plot_pairwise_comparisons <- function(comparison_result,
type = c("mean_scores_ratio", "pval")) {
comparison_result <- data.table::as.data.table(comparison_result)

relative_skill_metric <- grep(
Expand Down
6 changes: 3 additions & 3 deletions inst/create-metric-tables.R
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ pit <- list(
mean_score_ratio <- list(
`Metric` = "Mean score ratio",
`Name` = r"(mean_scores_ratio)",
`Functions` = r"(pairwise_comparison())",
`Functions` = r"(get_pairwise_comparisons())",
`D` = r"($\sim$)",
`C` = r"($\sim$)",
`B` = r"($\sim$)",
Expand All @@ -201,7 +201,7 @@ mean_score_ratio <- list(
relative_skill <- list(
`Metric` = "Relative skill",
`Name` = list("relative_skill"),
`Functions` = r"(score(), pairwise_comparison())",
`Functions` = r"(score(), get_pairwise_comparisons())",
`D` = r"($\sim$)",
`C` = r"($\sim$)",
`B` = r"($\sim$)",
Expand All @@ -213,7 +213,7 @@ relative_skill <- list(
scaled_relative_skill <- list(
`Metric` = "Scaled relative skill",
`Name` = "scaled_rel_skill",
`Functions` = r"(score(), pairwise_comparison())",
`Functions` = r"(score(), get_pairwise_comparisons())",
`D` = r"($\sim$)",
`C` = r"($\sim$)",
`B` = r"($\sim$)",
Expand Down
6 changes: 3 additions & 3 deletions inst/manuscript/R/00-standalone-Figure-replication.R
Original file line number Diff line number Diff line change
Expand Up @@ -575,9 +575,9 @@ score(example_quantile) |>
# Figure 9
# =============================================================================#
score(example_quantile) |>
pairwise_comparison(by = c("model", "target_type"),
baseline = "EuroCOVIDhub-baseline") |>
plot_pairwise_comparison() +
get_pairwise_comparisons(by = c("model", "target_type"),
baseline = "EuroCOVIDhub-baseline") |>
plot_pairwise_comparisons() +
facet_wrap(~ target_type)


Expand Down
8 changes: 4 additions & 4 deletions man/add_pairwise_comparison.Rd → man/add_relative_skill.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/compare_two_models.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/geometric_mean.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 5 additions & 5 deletions man/pairwise_comparison.Rd → man/get_pairwise_comparisons.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 5 additions & 5 deletions man/pairwise_comparison_one_group.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/permutation_test.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading