Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add k-band and iid energy score estimator #25

Open
simon-hirsch opened this issue May 26, 2024 · 7 comments
Open

Add k-band and iid energy score estimator #25

simon-hirsch opened this issue May 26, 2024 · 7 comments

Comments

@simon-hirsch
Copy link
Contributor

simon-hirsch commented May 26, 2024

If you have a lot of samples/scenarios/ensembles but little time (or little computational power):

Equations 15 and 16 in Berk & Ziel, 2019

@sallen12
Copy link
Collaborator

sallen12 commented May 29, 2024

The iid energy score estimator has been used in a few studies and would be a nice, straightforward addition to the package. I would avoid including the k-band estimator for now - there are some errors in Equation 16 of Berk & Ziel, 2019, and this approximation isn't used elsewhere. The fraction in Equation 15 should also be 2/M rather than M/2, see e.g. Moller et al., 2013, which is important to remember when we implement it.

@simon-hirsch
Copy link
Contributor Author

All right @sallen12. - We've used the $k$ - band estimator e.g. here here. The fraction for the k-band should read 1 / (M * K) - There is even a typo in our paper, where a "-1" sneaked in. I agree though, that it's not the most used method to approximate the ES.
I'll add the IID estimator only then.

@sallen12
Copy link
Collaborator

Thanks for the link to the paper. It's probably just a notational misunderstanding then, but in your Eq 26, if K < M, then for any m > K, the lower bound on the second summation, k = m, will be higher than the upper bound, K. In this case, is it assumed that the summation is zero?

@simon-hirsch
Copy link
Contributor Author

Hi, what we did, is essentially taking: $\hat{y}^{[m]} - \hat{y}^{[m+k]}$ for $K$, where $\hat{y}^{[m]}$ is the $m$-th ensemble member. I.e. for $K=1$, you just have $$1 / M \sum^M_{m=1} | \hat{y}^{[m]} - \hat{y}^{[m+1]} |$$, for $K=2$ you get $$1 / (2M) (\sum^M_{m=1} | \hat{y}^{[m]} - \hat{y}^{[m+1]}| + \sum^M_{m=1} | \hat{y}^{[m]} - \hat{y}^{[m+2]} |)$$ and so on. Does that make sense for you?

@sallen12
Copy link
Collaborator

Thanks for clarifying. So, just to check I've understood, the second summation in Eq 26 starts from $k = 1$ rather than $k=m$? In this case everything makes sense to me, and I agree it could be a useful estimator to include in the package 👍

As an aside, for the implementation, it would be useful to randomise the ensemble members before applying this formula (also for the iid formula). Otherwise, if there is some sort of ordering among the ensemble members, e.g. because of how the ensembles are sampled from the predictive distribution, then only taking differences between nearby members (as is the case when $K=1$) will generally underestimate the underlying expectation.

@simon-hirsch
Copy link
Contributor Author

Yeah, or it ends at $M+K$ if you want to start at $k=m$. Starting at 1 is less confusing though.

Essentially shuffle the ensembles once? Not sure how this will work with the gufuncs, but generally agree that it makes sense. Will check and also put a seed in the estimator 👍

@sallen12
Copy link
Collaborator

Great, all clear now, thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants