-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding BetaBernoulli distribution with LogScore #132
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @guyko81,
Thanks for the PR. I've left some comments.
The main issue is I'm not sure how you derived the Fisher information. In my derivation the Fisher Information is not diagonal, and moreover its entries involve the trigamma function.
I don't have 100% confidence in my derivation though, so if you could paste your derivation it'd be very helpful to double check.
) | ||
return D | ||
|
||
def metric(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a mistake not including the other diagonal. My calculation was based on the definition of the FI matrix as the variance of the score. Therefore I just simply squared the gradient (but forgot that it's actually a vector and the square should be S*S.T).
Can we use the double derivative? Are we using this:
Claim: The negative expected Hessian of log likelihood is equal to the Fisher Information Matrix F.
https://wiseodd.github.io/techblog/2018/03/11/fisher-information/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my calculation the last row is different, it's
However when I use that formula the model doesn't work - it drops the singular matrix error again. And when I include the full FI matrix from the variance definition it also drops the singular matrix error. When I use the diagonal matrix from the Hessian definition it simply doesn't learn.
So the only working solution is the diagonal matrix from the variance definition. I have no clue why.
If you want to try the different approaches I have updated the code. All you need to do is to comment out the other diagonal or to comment out the other metric definition. For now I keep the working version as my pull request, but please check my code as I'm not sure I didn't miss something or didn't do a typo again.
ngboost/distns/betabernoulli.py
Outdated
|
||
def fit(Y): | ||
|
||
def fit_alpha_beta_py(impressions, clicks, alpha0=1.5, beta0=5, niter=1000): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we clean this function up a bit? In particular it's not clear what impressions / clicks are supposed to be. If impressions
is going to be a vector of ones in all cases maybe we can remove it as an argument?
Also it'd be great if we could apply black
formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleared the function
@tonyduan are you planning on doing a re-review here? I see some relevant changes have been made. |
There's an identifiability issue so I'm recommending against implementing the BetaBernoulli. We should implement the BetaBinomial instead. Pasting my earlier comment below:
|
I didn't express it clearly but I proposed this branch as an alternative for logistic regression or any binary classifier. Therefore creating the Beta-Binomial would add no benefit for that specific problem. Although I'm happy to implement the Beta-Binomial too, but I would still like to keep the Beta-Bernoulli as well, probably in this form. Can I ask whether it's possible that the diagonal version of the FI matrix still helps finding the true Beta-Bernoulli distribution (strictly from mathematical point of view)? I mean when I ran the model it gave reasonable results. I tested it on Kaggle and the model was at an acceptable level (around the random forest result), but obviously with an extra info of the uncertainty of the predicted probabilities. Or I'm in a false confidence just based on the expected value, and the uncertainty was just a random value? |
Unfortunately I think it's the latter :( You can tell this just by doing some math with the probability mass function of the beta-bernoulli, which is
where
Call these values For any given |
Just a suggestion but I think it would be nice as part of this PR to add to the distns test here for the new dist: https://github.com/stanfordmlgroup/ngboost/blob/master/ngboost/tests/test_distns.py |
@guyko81 should we close this PR in light of the issues with this model or are you interested in extending it to the n>1 case that @tonyduan described?
|
@alejandroschuler I had many things on my plate, but I will find the time to do it in the upcoming days. Please keep it open for few days and I come back with an update. |
@guyko81 just checking in on this pr |
I struggle to figure out how to handle n (trials) in the metric/d_score. The number of trials in BetaBinomial should come from the dataset, therefore it's not a parameter, but the equations expect them in the FI. Can someone help me out? this is the metric function: it has no n in the parameters and this is the FI[:, 0, 0], but there is n: |
That means
this is the way I've implemented the categorical distribution so you can look there for inspiration. If you need the value of |
BetaBernoulli distribution implemented with LogScore, Fisher Information included.