Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for "Mixture of Parameters" scoring #898

Open
damonbayer opened this issue Sep 6, 2024 · 10 comments
Open

Support for "Mixture of Parameters" scoring #898

damonbayer opened this issue Sep 6, 2024 · 10 comments

Comments

@damonbayer
Copy link
Contributor

damonbayer commented Sep 6, 2024

I am borrowing the phrasing of this concept from Krüger 2021. Often, forecasts are issued as mixtures of closed-form forecasts (e.g. each MCMC sample corresponds to a predictive distribution for the data). In practice, we often sample from each of these closed-form forecasts to produce a sample-based forecast, but we could use the mixture of distributions directly. This is particularly useful for scoring forecasts for which the realized value is very unlikely (such that it is not included in the sampled values) but still in the support of the mixture forecast.

This feature is alluded to on pg 23 of the manuscript:

forecasts represented in a closed-form distribution (as can be scored for example using
scoringRules are not supported.

(Note the missing closing parenthesis)

Could this feature be supported?

@nikosbosse
Copy link
Contributor

Hi Damon, you mean scoring closed-form distributions? In principle it should be possible. I think the major part of the work would be designing appropriate input and output formats.

I.e. something like this:
image

or this

image

Would you be interested in helping with this?

@damonbayer
Copy link
Contributor Author

damonbayer commented Sep 6, 2024

Hi Damon, you mean scoring closed-form distributions? In principle it should be possible. I think the major part of the work would be designing appropriate input and output formats.

Yes, I think the input format would be something like

column type
observed numeric
distribution character (but has to be something supported by scoringRules)
parameters named numeric vector (arguments to distribution) (or named list if the arguments to the distribution are not all numeric)

Would you be interested in helping with this?

Sure.

@nikosbosse
Copy link
Contributor

nikosbosse commented Sep 9, 2024

The simple version would be check that the distribution is one of the required ones, but not check the parameters (i.e. leave that to scoringRules. I think it shouldn't be too complicated. But then again it's always more complicated than I think initially :)
What is your timeline/level of urgency with this? I.e. do you have a specific project in mind that you would like to support?

@damonbayer
Copy link
Contributor Author

No particular urgency. I hacked this together with the old (non-dev) version of scoringutils for a previous project and would prefer for it to be more supported. Would you be able to put together an outline/checklist of requirements for this contribution?

@seabbs
Copy link
Contributor

seabbs commented Sep 9, 2024

This sounds like a good idea and happy to support.

version of scoringutils for a previous project and would prefer for it to be more supported. Would you be able to put together an outline/checklist of requirements for this contribution?

@damonbayer do you have some version of this scratch code you can share to give some hints into any gotchas here?

@nikosbosse happy to lead on building on the actions that would need to be taken here to go from idea to fully implemented? I am just wondering if there is any reason to wait for any changes to requirements for a new forecasting type?

@nikosbosse
Copy link
Contributor

nice! so my current plan was to address #832 next - this should make it a bit easier to create a new forecast class.

Then I think it would be good to reorganise the files such that all functions related to a forecast type are in one file - this should give us some more clarity what we actually have to do to create a new forecast type.

We have another request for a new forecast type here: #846. I think this could be a nice test bed to test the flow for creating a new forecast class. Ideally we should document what to do more clearly such that others know what they have to do.

My personal preference would be to implement the ordinal forecast one first, but it's of course also possible to do it the other way round and use the distribution scoring as a test bed.

@seabbs help on all of these appreciated if you like. Otherwise that would be the rough order in which I would tackle things.

@seabbs
Copy link
Contributor

seabbs commented Sep 10, 2024

This all sounds like a sensible plan to me and I agree that ordinal might be easier. If @damonbayer is up for it that PR could be used as a bit of a guide for implementing this new class?

@nikosbosse
Copy link
Contributor

Flagging some (potentially related PRs: #888 and #889). TLDR is: all of this is not very hard, but we still have some work to do to make everything truly modular.

@nickreich
Copy link
Collaborator

If I'm understanding the issue correctly, then I think it is possible that some of the ideas outlined in this preprint may be relevant to a possible solution here. The key idea of this paper is that closed-form mixture distributions could be used to approximate complex posterior/predictive distributions that perhaps only exist as samples from that distribution. There are some specific ideas (that could be adapted to work with scoringutils) about how to represent such a mixture.

For context: we have run into some related issues about this trade off between filesize and scoreability in the SARS-CoV-2 Variant Nowcasting Hub.

@seabbs
Copy link
Contributor

seabbs commented Dec 10, 2024

I think we have cleared out all the blockers so this is possible to work on now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

4 participants