Skip to content

Commit

Permalink
Merge pull request #916 from epiforecasts/check-pit
Browse files Browse the repository at this point in the history
359: Check PIT
  • Loading branch information
nikosbosse authored Sep 21, 2024
2 parents 2c363c8 + 52e7340 commit 9d28b4b
Show file tree
Hide file tree
Showing 8 changed files with 240 additions and 260 deletions.
20 changes: 5 additions & 15 deletions R/pit.R
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
#' forecasts \eqn{F_t} are said to be ideal if \eqn{F_t = G_t} at all times t.
#' In that case, the probabilities \eqn{u_t} are distributed uniformly.
#'
#' In the case of discrete outcomes such as incidence counts,
#' In the case of discrete nonnegative outcomes such as incidence counts,
#' the PIT is no longer uniform even when forecasts are ideal.
#' In that case a randomised PIT can be used instead:
#' \deqn{
Expand All @@ -37,19 +37,6 @@
#' of k. If \eqn{P_t} is the true cumulative
#' probability distribution, then \eqn{u_t} is standard uniform.
#'
#' The function checks whether integer or continuous forecasts were provided.
#' It then applies the (randomised) probability integral and tests
#' the values \eqn{u_t} for uniformity using the
#' Anderson-Darling test.
#'
#' As a rule of thumb, there is no evidence to suggest a forecasting model is
#' miscalibrated if the p-value found was greater than a threshold of p >= 0.1,
#' some evidence that it was miscalibrated if 0.01 < p < 0.1, and good
#' evidence that it was miscalibrated if p <= 0.01. However, the AD-p-values
#' may be overly strict and there actual usefulness may be questionable.
#' In this context it should be noted, though, that uniformity of the
#' PIT is a necessary but not sufficient condition of calibration.
#'
#' @param n_replicates The number of draws for the randomised PIT for
#' discrete predictions. Will be ignored if forecasts are continuous.
#' @inheritParams ae_median_sample
Expand Down Expand Up @@ -78,6 +65,9 @@
#' plot_pit(pit)
#' @export
#' @references
#' Claudia Czado, Tilmann Gneiting Leonhard Held (2009) Predictive model
#' assessment for count data. Biometrika, 96(4), 633-648.
#
#' Sebastian Funk, Anton Camacho, Adam J. Kucharski, Rachel Lowe,
#' Rosalind M. Eggo, W. John Edmunds (2019) Assessing the performance of
#' real-time epidemic forecasts: A case study of Ebola in the Western Area
Expand Down Expand Up @@ -194,7 +184,7 @@ get_pit.forecast_quantile <- function(forecast, by, ...) {
forecast[, quantile_coverage := (observed <= predicted)]
quantile_coverage <-
forecast[, .(quantile_coverage = mean(quantile_coverage)),
by = c(unique(c(by, "quantile_level")))]
by = c(unique(c(by, "quantile_level")))]
quantile_coverage <- quantile_coverage[order(quantile_level),
.(
quantile_level = c(quantile_level, 1),
Expand Down
20 changes: 5 additions & 15 deletions R/plot.R
Original file line number Diff line number Diff line change
Expand Up @@ -494,21 +494,21 @@ plot_pit <- function(pit,

# use breaks if explicitly given, otherwise assign based on number of bins
if (!is.null(breaks)) {
plot_quantiles <- breaks
plot_quantiles <- unique(c(0, breaks, 1))
} else if (is.null(num_bins) || num_bins == "auto") {
# automatically set number of bins
if (type == "sample-based") {
num_bins <- 10
width <- 1 / num_bins
plot_quantiles <- seq(width, 1, width)
plot_quantiles <- seq(0, 1, width)
}
if (type == "quantile-based") {
plot_quantiles <- unique(pit$quantile_level)
plot_quantiles <- unique(c(0, pit$quantile_level, 1))
}
} else {
# if num_bins is explicitly given
width <- 1 / num_bins
plot_quantiles <- seq(width, 1, width)
plot_quantiles <- seq(0, 1, width)
}

# function for data.frames
Expand All @@ -518,21 +518,11 @@ plot_pit <- function(pit,

# quantile version
if (type == "quantile-based") {
if (num_bins == "auto") {
} else {
width <- 1 / num_bins
plot_quantiles <- seq(width, 1, width)
}

if (!is.null(breaks)) {
plot_quantiles <- breaks
}

hist <- ggplot(
data = pit[quantile_level %in% plot_quantiles],
aes(x = quantile_level, y = pit_value)
) +
geom_col(position = "dodge") +
geom_col(position = "dodge", colour = "grey") +
facet_wrap(formula)
}

Expand Down
2 changes: 1 addition & 1 deletion man/get_metrics.forecast_point.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 3 additions & 14 deletions man/pit_sample.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 9d28b4b

Please sign in to comment.