Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run multicalibration on pre-computed scores w/o access to initial predictor #42

Open
flinder opened this issue Jan 12, 2024 · 4 comments

Comments

@flinder
Copy link

flinder commented Jan 12, 2024

I'm trying to multi-calibrate scores precomputed from a black-box model (assume we don't have access to the model itself) but I'm getting non-sensical results.

I'm wondering if this should work in theory (and there's some other bug in my code) or if there's a more fundamental reason this doesn't work.

Here's an example to illustrate what I'm trying to do:

library(mcboost)

# simulate some random data
n = 100
scores = runif(n)
labels = rbinom(n, 1, scores)
is_test = as.logical(rbinom(n, 1, 0.1))
segmentation_features = data.table(
    cbind(
        rbinom(n, 1, 0.1),
        rbinom(n, 1, 0.5)
    )
)

init_predictor = function(data) {
    # Hack to make it return pre-computed scores for train/test since we don't have access to the model
    if(nrow(data) > 50) {
        scores[!is_test]
    } else {
        scores[is_test]
    }
}

mc = MCBoost$new(
    auditor_fitter="TreeAuditorFitter", 
    init_predictor=init_predictor
)

mc$multicalibrate(
    segmentation_features[!is_test],
    labels[!is_test]
)
mc

prs = mc$predict_probs(segmentation_features[is_test])
@pfistfl
Copy link
Member

pfistfl commented Jan 21, 2024

Hi @flinder

sorry for the long silence, I've been travelling.
I will need to think about this a little and get back to you, I can't tell why this shouldn't work off the top of my head.
I will try to get back to you within next week.

@pfistfl
Copy link
Member

pfistfl commented Jan 21, 2024

Can you maybe provide a little more detail wrt. your problem?
In theory, your approach should mostly work fine.

My test: Instead of the second feature, I added the true labels as features to see if there is a general problem.

library(mcboost)
library(data.table)

# simulate some random data
n = 100
scores = runif(n)
labels = rbinom(n, 1, scores)
is_test = as.logical(rbinom(n, 1, 0.2))
segmentation_features = data.table(
    cbind(
        rbinom(n, 1, scores),
        labels
    )
)

The default hyperparameters are not always helpful, I used the following:

mc = MCBoost$new(
    auditor_fitter="TreeAuditorFitter", 
    init_predictor=init_predictor,
    eta=0.5,
    alpha=1e-7,
    max_iter=20, 
    multiplicative = TRUE
)

Computing the Brier Score (MSE) now yields a strong improvement:

mse = function(x,y) {mean((x-y)^2)}
mse(scores[is_test], labels[is_test])
mse(prs, labels[is_test])

[1] 0.153977
[1] 0.0592801

NOTE: If you use a iter_sampling strategy other than none (the default), row order in your data might be shuffled or data might be subsetted. In this case you might have to adjust the initial learner to return the correct rows.

@flinder
Copy link
Author

flinder commented Jan 22, 2024

@pfistfl thanks for looking into it so quickly. Verifying that the function should work in this way is already very helpful! I'll look a bit more into your suggestions re hyper parameters and double check my code.

Can you maybe provide a little more detail wrt. your problem?

Here's a plot of the calibrated vs. original scores that I get for my problem:

image

@pfistfl
Copy link
Member

pfistfl commented Jan 22, 2024

The discontinuity stems from the fact that probabilities are bucketed at [0,0.5] and (0.5, 1] and then predictions are adapted within each bucket. From the figure I would assume that your problem is slightly imbalanced with more labels at 0 which leads to the overall predictions being pushed down?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants