Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add contant baseline model #88

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions models/constant-baseline/Rupdate-constant-baseline.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
if (baseline == "constant baseline") {
nikosbosse marked this conversation as resolved.
Show resolved Hide resolved
# get last observed value for median

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file name should no include name of model or R. its in the constant-baseline folder so this is repetition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update name to be update.R or similar having constant-baseline in both folder and filename makes me sad.

Copy link
Contributor Author

@nikosbosse nikosbosse Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to make you sad! :(
It is just that all other files are called update-timeseries.R and update-rt.R etc...

last_value <- filtered_observations$value[nrow(filtered_observations)]
median <- rep(last_value, num_horizons)

# get width
sigma <- filtered_observations %>%
dplyr::mutate(difference = c(NA, diff(log(value)))) %>%
dplyr::filter(target_end_date > max(target_end_date) - 4 * 7) %>%
dplyr::pull(difference) %>%
sd()
width <- rep(sigma, num_horizons)
}


# Packages ----------------------------------------------------------------
library(magrittr)
nikosbosse marked this conversation as resolved.
Show resolved Hide resolved
library(dplyr)
library(readr)
nikosbosse marked this conversation as resolved.
Show resolved Hide resolved
library(here)
library(data.table)

# forecast date -----------------------------------------------------------
forecast_date <- readRDS(here("data", "target_date.rds"))

sigma <- filtered_observations %>%
# Set up functions and data -----------------------------------------------
source(here::here("utils", "get-us-data.R"))
source(here::here("models", "timeseries", "utils", "deaths-on-cases-forecast.R"))

# Get data
weekly_deaths_state <- get_us_deaths(data = "daily") %>%
nikosbosse marked this conversation as resolved.
Show resolved Hide resolved
filter(date >= (as.Date(forecast_date) - 8 * 4)) %>%
group_by(state, epiweek) %>%
nikosbosse marked this conversation as resolved.
Show resolved Hide resolved
summarise(deaths = sum(deaths),
target_end_date = max(date),
.groups = "drop_last") %>%
dplyr::select(-epiweek) %>%
dplyr::ungroup()

weekly_deaths_national <- weekly_deaths_state %>%
group_by(target_end_date) %>%
summarise(deaths = sum(deaths), .groups = "drop_last") %>%
mutate(state = "US")

obs <- rbindlist(list(weekly_deaths_state, weekly_deaths_national),
nikosbosse marked this conversation as resolved.
Show resolved Hide resolved
use.names = TRUE) %>%
dplyr::filter(target_end_date <= as.Date(forecast_date))

# get median and sd of forecast ------------------------------------------------
median <- obs %>%
dplyr::filter(target_end_date == max(target_end_date)) %>%
dplyr::group_by(state) %>%
dplyr::rename(median = deaths) %>%
dplyr::mutate(horizon = list(1:4),
median = list(rep(median, 4))) %>%
tidyr::unnest(cols = c(horizon, median))

sigma <- obs %>%
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments outlining what is happening here.

dplyr::group_by(state) %>%
dplyr::mutate(difference = c(NA, diff(log(deaths + 0.0001)))) %>%
dplyr::filter(target_end_date > max(target_end_date) - 4 * 7) %>%
dplyr::summarise(sd = sd(difference, na.rm = TRUE),
.groups = "drop_last") %>%
dplyr::group_by(state) %>%
dplyr::mutate(horizon = list(1:4),
sd = list(sd * c(0.8, 0.9, 1, 1.1))) %>%
tidyr::unnest(cols = c(horizon, sd)) %>%
dplyr::ungroup()

combined <- dplyr::inner_join(median, sigma, by = c("state", "horizon")) %>%
dplyr::mutate(target_end_date = as.Date(target_end_date) + 7 * horizon)


# get quantiles for forecast ---------------------------------------------------
quantile_grid <- c(0.01, 0.025, seq(0.05, 0.95, 0.05), 0.975, 0.99)

forecasts <- combined %>%
dplyr::rowwise() %>%
dplyr::mutate(quantile = list(quantile_grid),
value = list(exp(qnorm(p = quantile_grid,
mean = log(median),
sd = sd))))


# do formatting for submission -------------------------------------------------
formatted_forecasts <- 0

state_codes <- readRDS(here::here("data/state_codes.rds"))
nikosbosse marked this conversation as resolved.
Show resolved Hide resolved


formatted <- dplyr::left_join(forecasts, state_codes, by = "state") %>%
dplyr::mutate(forecast_date = forecast_date,
submission_date = forecast_date,
target = paste(horizon, "wk ahead inc death"),
type = "quantile") %>%
dplyr::ungroup() %>%
dplyr::select(-c(median, sd, state)) %>%
tidyr::unnest(cols = c(quantile, value))

# add point forecast
point_forecast <- formatted_forecasts %>%
dplyr::filter(quantile == 0.5) %>%
dplyr::mutate(type = "point")

formatted_forecasts <- dplyr::bind_rows(formatted,
point_forecast) %>%
dplyr::relocate(forecast_date, submission_date, target, target_end_date, location, type, quantile, value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to hook into weekly scheduled run time (i.e files in models)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done - I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done - I think. Needs checking whether there are any other places this needs to go

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be in the aggregated R script in order for the results to be loaded into the submissions folder.



submission_dir <- here("models", "constant-baseline", "data", "submission")
if (!dir.exists(submission_dir)) {
dir.create(submission_dir, recursive = TRUE)
}

fwrite(formatted_forecasts, paste0(submission_dir, "/", forecast_date, ".csv"))
nikosbosse marked this conversation as resolved.
Show resolved Hide resolved
Loading