Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #404: Use na.omit() to remove NA values before scoring #465

Merged
merged 17 commits into from
Jan 3, 2024

Conversation

nikosbosse
Copy link
Contributor

@nikosbosse nikosbosse commented Nov 18, 2023

Closes #404

Update:
We had an internal function to delete rows in the data where either observed or predicted are NA. After rethinking this a bit it feels like the best solution is to simply replace this function with na.omit():

  • Having NA values anywhere in the data (not just in observed and predicted) is potentially risky and could break things (e.g. mess up grouping according to the forecast unit). Handling this makes things more complicated.
  • na.omit() is a known function. Having (and exporting) a function that just removes NA values in certain columns seems unnecessarily complicated).

The PR

  • replaces remove_NA_observed_predicted() with na.omit()
  • updates the test a bit to check the behaviour of na.omit() is what we expect (e.g. checking classes / attributes are respected)
  • updates tests in multiple places to avoid random messages during testing
  • updates the function validate_general() to throw a message when NA values are encountered (in contrast to the previous convoluted mechanism where a message attribute was created)
  • removes a random linebreak in summarise_scores()
  • updates the NEWS file

@nikosbosse nikosbosse added the blocked Something needs to happen before this can proceed label Nov 18, 2023
@nikosbosse nikosbosse changed the base branch from dev to expose-functions November 18, 2023 13:51
@seabbs

This comment was marked as outdated.

seabbs

This comment was marked as outdated.

Base automatically changed from expose-functions to develop November 20, 2023 11:57
@nikosbosse nikosbosse removed the blocked Something needs to happen before this can proceed label Nov 20, 2023
@nikosbosse nikosbosse marked this pull request as draft November 20, 2023 16:13
@nikosbosse nikosbosse marked this pull request as ready for review November 20, 2023 20:28
@nikosbosse nikosbosse marked this pull request as draft November 20, 2023 20:37
Copy link

codecov bot commented Nov 21, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (d04a82c) 82.50% compared to head (81a1506) 81.87%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #465      +/-   ##
===========================================
- Coverage    82.50%   81.87%   -0.63%     
===========================================
  Files           20       20              
  Lines         1680     1677       -3     
===========================================
- Hits          1386     1373      -13     
- Misses         294      304      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nikosbosse nikosbosse changed the title Issue #404: Expose get_complete_forecasts() to users Issue #404: Use na.omit() to remove NA values before scoring Nov 21, 2023
@nikosbosse
Copy link
Contributor Author

Phew. In theory, this should be a very simple PR, in practice it turned out not to be.
The fact that I had to update so many tests highlights the issue that we get a lot of messages. Each of these functions generates a message, for example:

get_forecast_counts(example_binary)
score(example_binary)
add_coverage(example_quantile)
score(example_quantile)

and probably more, as they all run as_forecast() internally.

Some options (likely for future PRs):

  • live with it
  • force users to call as_forecast() before they do anything else
  • have an option that as_forecast() removes NA values instead of just warning about them (the reason it doesn't do that at the moment is for plotting). This would reduce duplications.

@nikosbosse nikosbosse marked this pull request as ready for review January 2, 2024 15:38
@nikosbosse nikosbosse requested a review from seabbs January 2, 2024 15:38
@seabbs
Copy link
Contributor

seabbs commented Jan 2, 2024

force users to call as_forecast() before they do anything else

I feel like this might be the least painful option even if it does add an additional user step?

R/validate.R Show resolved Hide resolved
Copy link
Contributor

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. oof yes, I see what you mean!

@nikosbosse nikosbosse merged commit a0f23ed into develop Jan 3, 2024
10 of 11 checks passed
@nikosbosse nikosbosse deleted the expose-functions2 branch January 3, 2024 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants