Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #498: Implement a simple print function for forecast objects. #592

Merged

Conversation

toshiakiasakura
Copy link
Collaborator

Description

This PR closes #498, and a refresh version of the PR from #584.

The updated simple print function behaviour becomes like the following.
get_protected_columns function returns columns including "scores columns" and then I excluded score columns from protected columns field to avoid redunduncy.

example_quantile %>%
  set_forecast_unit(c("location", "target_end_date", "target_type", "horizon", "model")) %>%
  as_forecast() %>%
  print
#>Forecast type:
#>[1] "quantile"
#>
#>Protected columns:
#>[1] "observed"  "quantile"  "predicted"
#>
#>Forecast units:
#>[1] "location"        "target_end_date" "target_type"     "horizon"        
#>[5] "model"          
#>
#>       observed quantile predicted location target_end_date target_type horizon
#>    1:   127300       NA        NA       DE      2021-01-02       Cases      NA
#>    2:     4534       NA        NA       DE      2021-01-02      Deaths      NA
#>    3:   154922       NA        NA       DE      2021-01-09       Cases      NA
#>    4:     6117       NA        NA       DE      2021-01-09      Deaths      NA
#>    5:   110183       NA        NA       DE      2021-01-16       Cases      NA
#>   ---                                                                         
#>20541:       78    0.850       352       IT      2021-07-24      Deaths       2

and

example_quantile %>%
  set_forecast_unit(c("location", "target_end_date", "target_type", "horizon", "model")) %>%
  as_forecast() %>%
  add_coverage() %>%
  print
#>Forecast type:
#>[1] "quantile"
#>
#>Protected columns:
#>[1] "observed"  "quantile"  "predicted" "range"    
#>
#>Score columns (Protected):
#>[1] "interval_coverage"           "interval_coverage_deviation"
#>[3] "quantile_coverage"           "quantile_coverage_deviation"
#>
#>Forecast units:
#>[1] "location"        "target_end_date" "target_type"     "horizon"        
#>[5] "model"          
#>
#>       observed quantile predicted location target_end_date target_type horizon
#>    1:   127300       NA        NA       DE      2021-01-02       Cases      NA
#>    2:     4534       NA        NA       DE      2021-01-02      Deaths      NA
#>    3:   154922       NA        NA       DE      2021-01-09       Cases      NA
#>    4:     6117       NA        NA       DE      2021-01-09      Deaths      NA
#>    5:   110183       NA        NA       DE      2021-01-16       Cases      NA
#>   ---                                                                         
#>20541:       78    0.950       370       IT      2021-07-24      Deaths       3

Checklist

  • My PR is based on a package issue and I have explicitly linked it.
  • I have included the target issue or issues in the PR title as follows: issue-number: PR title
  • I have tested my changes locally.
  • I have added or updated unit tests where necessary.
  • I have updated the documentation if required.
  • I have built the package locally and run rebuilt docs using roxygen2.
  • My code follows the established coding standards and I have run lintr::lint_package() to check for style issues introduced by my changes.
  • I have added a news item linked to this PR.
  • I have reviewed CI checks for this PR and addressed them as far as I am able.

Copy link

codecov bot commented Jan 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (124b1db) 85.77% compared to head (b1e3dfd) 85.88%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #592      +/-   ##
==========================================
+ Coverage   85.77%   85.88%   +0.10%     
==========================================
  Files          21       21              
  Lines        1758     1771      +13     
==========================================
+ Hits         1508     1521      +13     
  Misses        250      250              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nikosbosse
Copy link
Contributor

nikosbosse commented Jan 16, 2024

Nice! Thanks so much and apologies for taking so long to get back to you.
Looking good overall, some thoughts and comments:

I think the following line
https://github.com/toshiakiasakura/scoringutils/blob/d4d0446a09ace4ec5bc509374ac5e79ca85367fb/R/available_forecasts.R#L63
needs to be changed to

out <- as.data.table(data)[, .(count = .N), by = by]

The reason is that the object is still of class forecast_*(), but get_forecast_counts() removes the columns observed and predicted which can lead to an error.

To make sure we have that covered in the future we should in addition add a test to test-available_forecasts.R, something like

test_that("get_forecast_counts() works on an object of class `forecast_*`", {
  ex <- suppressMessages(as_forecast(example_integer))
  expect_no_condition(
    suppressMessages(get_forecast_counts(ex, by = "model"))
  )
})

We have a similar issue in when running the following piece of code from the scoringutils.Rmd Vignette:

suppressMessages(score(example_point)) %>%
  summarise_scores(by = "model", na.rm = TRUE)

The issue here is that the output of score() is still of class forecast_sample (which it shouldn't be). I suggest addressing this in a different PR though (already started working on it).

Update: This is done and merged into main.

The current suggestion defines an extra function, print_forecast_info(). I think in principle we don't need the function. We could also just do

print.forecast_binary <- function(x, ...) {yourcode}

and

print.forecast_quantile <- print.forecast_binary

etc. Probably even better, we should consider introducing a super-class, forecast or scoringutils that all of these objects have. This could then be used for printing and we would only have to define a single printing function. Maybe it's worth addressing this in a separate issue/PR and going with the first option for now. What do you think, @seabbs?

I would personally prefer to keep the protected columns out of the printing method for now and introduce them later. I think it is generally nice to inform about protected columns. However, I would prefer to add them at a later point in time because we don't really have written guidance on what a protected column actually is. At the moment it is "something that functions aren't allowed to touch and where you need to be careful to rename them". But there are a lot of uncertainties. What are our protected columns exactly? Is a score a protected column? I think we need something written down that explains what a protected column actually is before we can inform users about the protected columns.

Summarising this wall of text, my suggestions are

  • updating line 63 in available_forecasts.R + adding a test to make sure that the problem doesn't occur in the future
  • removing print_forecast_info()
  • fixing the class that the output of score() has - I can take care of that
  • removing the protected columns from the print method for now and reintroducing them later once we have a clear agreed understanding of what a protected column is

What do others think? I think the overall PR is really good, as always lots of tiny details to deal with. Thanks a lot again.

@seabbs
Copy link
Contributor

seabbs commented Jan 18, 2024

I suggest addressing this in a different PR though (already started working on it).

i.e. no action needed here for you @toshiakiasakura

This could then be used for printing and we would only have to define a single printing function. Maybe it's worth addressing this in a separate issue/PR and going with the first option for now. What do you think

Yeah I think getting something simple off the ground makes sense and then thinking about merging in a future issue/PR would be good. I can see a good argument for why we would want a meta-class or classes (for example for categorical forecasts (which have nominal, ordinal, and binary in them) etc.) but again yes a problem for another day IMO.

So for @toshiakiasakura / this PR we have:

  • updating line 63 in available_forecasts.R + adding a test to make sure that the problem doesn't occur in the future
  • removing print_forecast_info() and code directly into the methods
  • removing the protected columns from the print method for now and reintroducing them later once we have a clear agreed understanding of what a protected column
  • Use nextMethod vs using print(as.data.table)
  • Clean up linting issue
  • Ideally add some kind of minimal testing of the new functionality

Really excited to get this in as I think it is really useful functionality.

R/utils.R Outdated Show resolved Hide resolved
@nikosbosse
Copy link
Contributor

@toshiakiasakura How are you feeling about this at the moment? Happy to help get this PR over the finish line!

@toshiakiasakura
Copy link
Collaborator Author

Thanks for a lot of suggestions. I will try to work on following the checklist. I agree with most.

One reason I separately prepare `print_forecast_info' is that in my mind, in the later phase, I am thinking of introducing the validating results in the print result. For that purpose, we need a bit different implementation for each print object, and then allow the code to be extended differently.

For the simplicity, should I still at this stage amend the code to remove print_forecast_info?

@seabbs
Copy link
Contributor

seabbs commented Jan 25, 2024

For the simplicity, should I still at this stage amend the code to remove print_forecast_info?

Yes I think that makes sense for this PR

In this commit, I edited
- Remove print_forecast_info() and directly assigned the code into the
  methods.
- Remove the protected columns from the print.
- Use nextMethod instead of print(as.data.table)
@toshiakiasakura
Copy link
Collaborator Author

toshiakiasakura commented Jan 28, 2024

I've almost done with the updates.

  • updating line 63 in available_forecasts.R + adding a test to make sure that the problem doesn't occur in the future
    I added

For testing get_forecast_counts, I simply added one line to check the returning object is exactly matched with the data.table class using expect_s3_class.

  • removing print_forecast_info() and code directly into the methods
  • removing the protected columns from the print method for now and reintroducing them later once we have a clear agreed understanding of what a protected column
  • Use nextMethod vs using print(as.data.table)
  • Clean up linting issue
  • Ideally add some kind of minimal testing of the new functionality

Added the simple test to check the print function outputs the "Forecast type" and "Forecast units".

Currently output is like this.

example_quantile %>%
  set_forecast_unit(c("location", "target_end_date", "target_type", "horizon", "model")) %>%
  as_forecast() %>%
  add_coverage() %>%
  print
#> Forecast type:
#> [1] "quantile"
#> 
#> Score columns:
#> [1] "interval_coverage"           "interval_coverage_deviation"
#> [3] "quantile_coverage"           "quantile_coverage_deviation"
#> 
#> Forecast units:
#> [1] "location"        "target_end_date" "target_type"     "horizon"        
#> [5] "model"          
#> 
#>        observed quantile predicted location target_end_date target_type horizon
#>     1:   127300       NA        NA       DE      2021-01-02       Cases      NA

R/utils.R Outdated Show resolved Hide resolved
R/utils.R Outdated Show resolved Hide resolved
R/utils.R Outdated Show resolved Hide resolved
R/utils.R Outdated Show resolved Hide resolved
R/utils.R Outdated Show resolved Hide resolved
Copy link
Contributor

@nikosbosse nikosbosse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks so much, @toshiakiasakura. This is looking really good and I think we're nearly there. I made some minor suggestions, but those are mostly housekeeping things (it helps running the check in RStudio or R CMD check in the terminal regularly to catch these early)

Two additional things:

  • could you please add yourself to the contributors in the DESCRIPTION file? It is well deserved :)
  • we should maybe add something to NEWS.md / updating the current News file saying that diagnostic messages are now printed directly.

@toshiakiasakura
Copy link
Collaborator Author

Thanks for correcting me. Now, I checked the code by the R CMD check command and ensured the tests and checks were passed.

I have reflected your suggestions in the code, added me as a contributor, and added an item to NEWS.md!

R/utils.R Outdated Show resolved Hide resolved
@nikosbosse
Copy link
Contributor

Thanks a lot, @toshiakiasakura!
I made a small suggested change that just removes a line of code with a whitespace. The linter is complaining about superfluous whitespaces.

I'm not sure why the snapshot tests are failing. I checked out the code locally and tests passed there. Did they pass locally on your end? I'll investigate.

As an aside, we might also want to think about

  • adding a test for the print method, e.g. using expect_output
  • failing gracefully. In the current version, the user will get an error if something like get_forecast_unit() fails for whatever reason. Ideally, the function should at least print the data.table, regardless of any other errors so the user can inspect it.

@toshiakiasakura
Copy link
Collaborator Author

toshiakiasakura commented Feb 3, 2024

Sure, I will check later the linting problem.

For testing, I implemented a very simple test in db0323f .
I will elaborate on it to check necessary info is printed and unnecessary info is not printed.

@toshiakiasakura
Copy link
Collaborator Author

toshiakiasakura commented Feb 5, 2024

I'm not sure why the snapshot tests are failing. I checked out the code locally and tests passed there. Did they pass locally on your end? I'll investigate.

Yes, the test is unfortunately passed even though there is one trailing white space...

adding a test for the print method, e.g. using expect_output

I added.

failing gracefully. In the current version, the user will get an error if something like get_forecast_unit() fails for whatever reason. Ideally, the function should at least print the data.table, regardless of any other errors so the user can inspect it.

@nikosbosse For this part, there is also a possibility that forecast_object is collapsed and get_forecast_type does not work.
My simple suggestion is to rely on validate_forecast to check the object, and if an error is present, print that error. If that test is passed, print the information of the forecast object as currently implemented. To achieve this, I am planning to write in the following way. What do you think about it?

res <- try(validate_forecast(x[, c("observed")] ), silent=TRUE)
if (class(res) == "try-error"){
    message(res)
} else {
    # current print contents.
}

or cause the error (and stop the flow) by putting validate_forecast(x) before the current print contents?

However, once any error occurs, this does not tell us any information about the forecast object.
TBH, I can not imagine the case when get_focast_unit only fails... (just getting empty when no unit is set?)

@toshiakiasakura
Copy link
Collaborator Author

For failing gracefully, I am wondering what case you expect the breakage of the forecast object. As far as I understand, broken usage is limited if the forecast object is kept. I guess converting from a forecast object to a data.table sometimes happens but that does not matter for this print functionality. So I prefer to keep the code simple.

@nikosbosse
Copy link
Contributor

@toshiakiasakura What I mean is that, for example, the following produces an error, because the object can't get printed anymore because get_forecast_type(), which is called by print.forecast_() requires a column "observed".

library(dplyr)
ex <- example_quantile |>
  as_forecast()

ex |> 
  select(-observed) 

Ideally, there should be something like an error (or a warning), but the data.table would still print.

@seabbs any thoughts on whether this should be part of this PR or a separate PR? I lean towards addressing it in this PR as it seems quite important to me that users can still print their output even if something is wrong. But I appreciate that this would mean adding even more to this PR, for which I apologise, @toshiakiasakura!

@seabbs
Copy link
Contributor

seabbs commented Feb 8, 2024

@seabbs any thoughts on whether this should be part of this PR or a separate PR? I lean towards addressing it in this PR as it seems quite important to me that users can still print their output even if something is wrong.

I am leaning towards addressing this in its own issue/PR as the suggestion above is already functional and we have had a fair few rounds of review already and I imagine everyone would like to see this on main soon!

NEWS.md Outdated Show resolved Hide resolved
@nikosbosse
Copy link
Contributor

nikosbosse commented Feb 8, 2024

@seabbs sounds good to me. Then I suggest merging it if you approve. I created an issue to address this further: #620

Copy link
Contributor

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for this @toshiakiasakura !! I suggest we also open another issue to think about if we could make this even prettier using {cli}.

@nikosbosse
Copy link
Contributor

Perfect, merging now. Thanks a lot @toshiakiasakura! I think this is a really visible and important improvement to users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a print method for forecast_binary, forecast_quantile etc.
3 participants