diff --git a/.github/workflows/pkgdown-netlify-preview.yaml b/.github/workflows/pkgdown-netlify-preview.yaml index 2c51f01..210525b 100644 --- a/.github/workflows/pkgdown-netlify-preview.yaml +++ b/.github/workflows/pkgdown-netlify-preview.yaml @@ -27,7 +27,7 @@ jobs: contents: write pull-requests: write steps: - - uses: actions/checkout@v3 + - uses: actions/checkout@v4 - uses: r-lib/actions/setup-tinytex@v2 @@ -39,7 +39,7 @@ jobs: - uses: r-lib/actions/setup-r-dependencies@v2 with: - extra-packages: any::pkgdown, local::. + extra-packages: r-lib/pkgdown, local::. needs: website - name: Build site @@ -53,13 +53,20 @@ jobs: clean: false branch: gh-pages folder: docs - + - id: deploy-dir + name: Determine dev status + run: | + if [[ $(grep -c -E 'sion. ([0-9]*\.){3}' ${{ github.workspace }}/DESCRIPTION) == 1 ]]; then + echo 'dir=./docs/dev' >> $GITHUB_OUTPUT + else + echo 'dir=./docs' >> $GITHUB_OUTPUT + fi - name: Deploy PR preview to Netlify if: contains(env.isPush, 'false') id: netlify-deploy - uses: nwtgck/actions-netlify@v2 + uses: nwtgck/actions-netlify@v3 with: - publish-dir: './docs' + publish-dir: '${{ steps.deploy-dir.outputs.dir }}' production-branch: main github-token: ${{ secrets.GITHUB_TOKEN }} deploy-message: diff --git a/vignettes/articles/scripting-tasks-config.Rmd b/vignettes/articles/scripting-tasks-config.Rmd new file mode 100644 index 0000000..1e89720 --- /dev/null +++ b/vignettes/articles/scripting-tasks-config.Rmd @@ -0,0 +1,452 @@ +--- +title: "Scripting task configuration files" +--- + +```{r, setup, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` +## Introduction + +The `tasks.json` configuration file is a the source of truth for validating a +hub and is also one of the most complex aspects of a hub. This configuration +file is a formal representation of an official collaborative forecasting +challenge that allows direct and unambiguous evaluation and comparison of model +outputs. It allows hub administrators to set a clear standard for model +submissions. + +While we provide [a hubTemplate](https://github.com/hubverse-org/hubTemplate/) +for use to get started with setting up a hub, JSON files are easy for a computer +to read, but not so easy for a human to read and less so to write. The hard part +of creating a `tasks.json` file should be what model parameters to use, not +constantly looking at JSON syntax or hunting down a missing comma. +To make this process easier, **we provide the `create_*()` family of functions +for creating `tasks.json` files programmatically**. These can be useful for +creating your initial `task.json` file, amending an existing file's +configurations or even appending new rounds programmatically. By making use of +this functionality, hubs can even automate the process of updating their files +through Github Actions. For example, the [variant nowcast +hub](https://github.com/reichlab/variant-nowcast-hub/tree/main) uses a [GitHub +Action +workflow](https://github.com/reichlab/variant-nowcast-hub/blob/29a9265d0b30f385fc47195d3d7f65ecda7a41d2/.github/workflows/create-modeling-round.yaml#L47) +that [runs a +script](https://github.com/reichlab/variant-nowcast-hub/blob/main/src/make_round_config.R) +to append a new round with updated variants to model on a weekly schedule! + +In this vignette, we will use the config creation functions in `hubAdmin` to +create a `tasks.json` file that represents two rounds of a modeling effort to +predict influenza hospitalizations and optionally deaths 1 to 4 weeks ahead +with mean, median, and quantile predictions in the US, and optionally 5 states. + +Before getting to the scripting, it's important to get a general look at what +a `tasks.json` file looks like structurally. + +## Structure of a `tasks.json` configuration file + +The structure of a `tasks.json` configuration file roughly looks like the +diagram below: + +``` +tasks.json +├─schema_version: "https://.../v4.0.0/tasks-schema.json" +├─rounds: +│ ├─model_tasks: +│ │ ├─task_ids: +│ │ │ └─[...] +│ │ ├─output_type: +│ │ │ └─[...] +│ │ └─target_metadata: +│ │ └─[...] +│ ├─round_id_from_variable: true +│ ├─round_id: origin_date +│ └─submissions_due: +│ └─[...] +├─output_type_id_datatype: "auto" +└─derived_task_ids: + └─[...] +``` + +The first property in a `tasks.json` file is the `schema_version`, which provides +a URL to the [hubverse schema](https://github.com/hubverse-org/schemas) version the config file is built against. The schema is used to validate the structure and contents of the config file. +The next property defined are the **rounds** which set the schedule and expected contents of model submissions +These rounds contain one or more [**modeling +tasks**](https://hubverse.io/en/latest/user-guide/tasks.html) that define the +expected content of a model submission against one or more modeling targets. +Each modeling task includes three properties: + + - `task_ids`: a collection of variables that can be used for modeling efforts. + **Each unique combination of task ID values represents a single modeling task**. + - `output_type`: the expected model output representation + - `target_metadata`: characteristics for each target (an example of a modeling + target is "Daily incident Flu Hospitalizations") + +## Creation of a `tasks.json` file + +To create the `tasks.json` file, we will start from the inside out. This means +that we will create the `target_metadata`, `task_ids`, `output_type`s first. +We will then use those to create two `model_task` objects and then bundle them +into two `round` objects, which will be inserted into a `config` object. + +### Creating the `target_metadata` objects + +The `target_metadata` provides both human-readable (`target_name` and +`target_units`) and machine-readable (e.g. `target_type` and `is_step_ahead`) +information about the targets. + +```{r create-target-metadata} +library(hubAdmin) + +target_metadata_hosp <- create_target_metadata_item( + target = "inc hosp", + target_name = "Weekly incident influenza hospitalizations", + target_units = "rate per 100,000 population", + target_keys = list(target = "inc hosp"), + target_type = "discrete", + is_step_ahead = TRUE, + time_unit = "week" +) + +target_metadata_death <- create_target_metadata_item( + target = "inc death", + target_name = "Weekly incident influenza deaths", + target_units = "rate per 100,000 population", + target_keys = list(target = "inc death"), + target_type = "discrete", + is_step_ahead = TRUE, + time_unit = "week" +) + +target_metadata <- create_target_metadata(target_metadata_hosp, target_metadata_death) +``` + +If you inspect the `target_metadata_hosp` object, you will see that it appears +as a class `target_metadata_item` with additional attributes about the schema +that created it: + +```{r tmh} +str(target_metadata_hosp) +``` + + +Likewise, `target_metadata` is a combination of `target_metadata_hosp`, +and `target_metadata_death`: + + +```{r, tme} +str(target_metadata) +``` + + +If this process just creates lists with metadata, then why bother using `hubAdmin` functions at all to create them? +The benefit is that `hubAdmin` `create_*()` functions provide some basic validation of objects when creating them, helping you catch some potential mistakes sooner. For +example, some of the values are interdependent and if you accidentally leave one +out, the function will provide a helpful error: + +```{r tmh-example} +#| error: true +create_target_metadata_item( + target = "inc hosp", + target_name = "Weekly incident influenza hospitalizations", + target_units = "rate per 100,000 population", + target_keys = list(target = "inc hosp"), + target_type = "discrete", + is_step_ahead = TRUE +) +``` + +Now that we have defined the targets in `target_metadata`, we can move on +to defining the task ID objects, which will define what the modeling parameters +will be. + +### Creating the `task_id` objects + +In this modeling effort, we want to **require** modelers to provide week-ahead +predictions for the "weekly incident influenza hospitalizations" target +(`inc hosp`). Model submissions can optionally include five states, provide up +to 4-week-ahead predictions, and also provide predictions for the "weekly +incident influenza deaths" target (`inc death`). + +```{r set-task-ids} +origin_date <- create_task_id( + "origin_date", + required = NULL, + optional = c("2023-01-02", "2023-01-09", "2023-01-16") +) + +location <- create_task_id( + "location", + required = "US", + optional = c("01", "02", "04", "05", "06") +) + +horizon <- create_task_id("horizon", required = 1L, optional = 2:4) + +target <- create_task_id("target", required = "inc hosp", optional = "inc death") + +task_ids_example <- create_task_ids(origin_date, location, horizon, target) + +``` + +Now that we've created the task IDs that specify the modeling tasks, we can define the outputs we +expect from the models. + +### Creating the `output_type` objects + +For this example, we want to have three output type objects: + +1. a required `"mean"` output type +2. a required `"quantile"` output type +3. an optional `"median"` output type + +Additionally, our two targets will accept a different combination of output +types. Specifically, target `"inc hosp"` will only accept `"mean"` and +`"median"` output types while `"inc death"` will only accept `"mean"` and +`"quantile"` output types. + +```{r create-output-type-ids} +mean_out_type <- create_output_type_mean( + is_required = TRUE, + value_type = "double", + value_minimum = 0 +) + +median_out_type <- create_output_type_median( + is_required = FALSE, + value_type = "integer", + value_minimum = 0L + +) + +quantile_out_type <- create_output_type_quantile( + required = c(0.25, 0.5, 0.75), + is_required = TRUE, + value_type = "double", + value_minimum = 0 +) +``` + +Now that we have our base `output_type_item` class objects, we can combine them to create `output_type` class objects that we want to use for each particular target. + +```{r combine-output-types} +output_type_mean_median <- create_output_type(mean_out_type, median_out_type) + +output_type_mean_quantile <- create_output_type(mean_out_type, quantile_out_type) +``` + +### Creating the `model_task` objects + +As previously discussed, we want to require a "mean" output for both tasks, but we want a "quantile" +output type for the `inc death` target and an optional "median" for the `inc +hosp` target. + +```{r create-model-task} +model_task_hosp <- create_model_task( + task_ids = task_ids_example, + output_type = output_type_mean_median, + target_metadata = target_metadata +) + +model_task_death <- create_model_task( + task_ids = task_ids_example, + output_type = output_type_mean_quantile, + target_metadata = target_metadata +) + +model_task_example <- create_model_tasks(model_task_hosp, model_task_death) +``` +Now that we have a set of model tasks, we can create rounds for them. + +### Creating the `round` objects + +While the model tasks define what a model output submission should look like, +the round additionally defines the schedule of submissions and the timeframe under which a +submission is considered valid. + +The most common way to create rounds is to create a single `round` object that +and specify a date task ID whose values will act as the round IDs for each round. These round IDs are used to calculate the submission window for each modeling round. In our example, we specify `"origin_date"` as the source variable for round IDs. We also specify that the submission window for a given round is a period of one week, where the beginning of the window is +4 days before the origin date and the end of the submission window is 2 days +after the origin date. This means if you have an origin date of +`r format(as.Date("2023-01-02"), "%A %Y-%m-%d")`, then valid submission dates +for that round are from +`r paste(format(as.Date("2023-01-02") + c(-4, 2), "%A %Y-%m-%d"), collapse = " to ")` + +```{r create-rounds} +round1 <- create_round( + round_id_from_variable = TRUE, + round_id = "origin_date", + round_name = "Round 1", + model_tasks = model_task_example, + submissions_due = list( + relative_to = "origin_date", + start = -4L, + end = 2L + ) +) + +rounds <- create_rounds(round1) +``` + +### Creating and saving the `tasks` config file + +And the penultimate step is to create a config file and write it out to your +hub. + +TODO: link to hubdocs about maintaining output type id data types. + +```{r create-tasks, eval = FALSE} +config <- create_config(rounds) + +write_config(config) +``` +```{r really-write} +#| echo: false +tmp <- tempfile(fileext = ".json") +config <- create_config(rounds, output_type_id_datatype = "auto") +write_config(config, config_path = tmp, overwrite = TRUE) +``` + +
+contents of `hub-config/tasks.json` + + +````json + +```{r show-config} +#| echo: false +#| results: asis +writeLines(readLines(tmp)) +``` + +```` + +
+ + +## Validating your config file + +One of the strengths of specifying configuration in a JSON format is that it's +machine-readable, which means that no matter how complex a JSON file gets, **it +can be easily and rapidly validated**. All hubverse configuration files are +validated against a central set of [hubverse +schemas](https://github.com/hubverse-org/schemas/) that specify _how_ a +configuration file is constructed. This allows any tool to be able to read a +JSON file and validate it against the schema to ensure it's valid. + +To check that your configuration file is valid, you can use the +`validate_config()` function from the root directory of your hub: + +```{r validate-config-show} +#| eval: false +validate_config(config = "tasks") +``` + +```{r validate_config} +#| echo: false +res <- validate_config(config_path = tmp) +attr(res, "config_path") <- "/path/to/hub/hub-config/tasks.json" +res +``` + +### The importance of validation + +Validating your config files is important because while we can validate simple +relationships during the construction of the configuration file, it's not always +possible to validate the relationships between the elements until the final +configuration file is written and can be validated against the schema. + +Take for example if we wanted to create two rounds to our current config that +has a different submission window from `"2023-01-09"` to "`2023-01-16"` (note that this +round contains an error which we will explain later): + +```{r round2-fight} +round2 <- create_round( + round_id_from_variable = TRUE, + round_id = "origin_date", + round_name = "Round 2", + model_tasks = model_task_example, + submissions_due = list(start = "2023-01-09", end = "2023-01-16") +) +two_rounds <- create_rounds(round1, round2) +new_config <- create_config(two_rounds) +``` + +```{r create-tasks-too, eval = FALSE} +write_config(new_config, overwrite = TRUE) +``` +```{r really-write-too} +#| echo: false +tmp2 <- tempfile(fileext = ".json") +write_config(new_config, config_path = tmp2, overwrite = TRUE) +``` + +Now we can attempt to run a validation: + +```{r validate-config-show-too} +#| eval: false +validation_result <- validate_config(config = "tasks") +validation_result +``` + +```{r validate_config-too} +#| echo: false +validation_result <- validate_config(config_path = tmp2) +attr(validation_result, "config_path") <- "/path/to/hub/hub-config/tasks.json" +validation_result +``` + + +Now we are getting a notification about six errors. Let's look at the error +table: + + +```{r err} +#| results: asis +view_config_val_errors(validation_result) +``` + +The reason why this came up is because we were defining two rounds that shared +identifiers because they both inherited identifiers from `"origin_date"`. Each +round must have a separate and unique identifier. This is because we forgot to +set `round_id_from_variable = FALSE` in our second round. If we change that, +the config becomes valid: + +```{r round2-agree} +round2 <- create_round( + round_id_from_variable = FALSE, # do not set the round ID from a task ID + round_id = "round-2", # use the round ID of "round-2" + round_name = "Round 2", + model_tasks = model_task_example, + submissions_due = list(start = "2023-01-09", end = "2023-01-16") +) +two_rounds <- create_rounds(round1, round2) +new_config <- create_config(two_rounds, output_type_id_datatype = "auto") +``` + +```{r create-tasks-agree, eval = FALSE} +write_config(new_config, overwrite = TRUE) +``` +```{r really-write-agree} +#| echo: false +tmp2 <- tempfile(fileext = ".json") +write_config(new_config, config_path = tmp2, overwrite = TRUE) +``` + +Now we can attempt to run a validation: + +```{r validate-config-show-agree} +#| eval: false +validation_result <- validate_config(config = "tasks") +validation_result +``` + +```{r validate_config-agree} +#| echo: false +validation_result <- validate_config(config_path = tmp2) +attr(validation_result, "config_path") <- "/path/to/hub/hub-config/tasks.json" +validation_result +``` + + +