-
Notifications
You must be signed in to change notification settings - Fork 736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shap_values
for tree-based models doesn't set check_additivity=False
as expected
#866
Comments
Thanks for the feedback - I agree that option 4 seems like the best approach, and we'd be happy to take a PR fixing it. I am surprised that our existing tests in test_shap.py don't catch this (because they do cover CausalForestDML and they run against shap==0.43); please add a test that fails without your fix (or modify at least one of the existing tests to do so). Also, we'd be happy to allow a higher upper bound (say |
@kbattocchi I'm not able to consistently reproduce this error. See #445 for another valiant but ultimately futile effort to reproduce it consistently. For posterity, here is (essentially) the code I was running when I (occasionally) ran into the error: Versions:
import econml
import numpy
import sklearn.ensemble as ens
seed = 1337
rng = np.random.default_rng(seed)
n = 1_000
n_x = 2
X = rng.uniform(-1.0, 1.0, size=(n, n_x))
A = rng.binomial(1, 0.5 + 0.5 * X[:, 0])
Y = rng.binomial(1, 0.3 + 0.5 * A + 0.2 * X[:, 0])
X_ = np.hstack([X, A[:, None]])
A_ = rng.binomial(1, 0.5, size=(n,))
model = dml.CausalForestDML(
model_y=ens.RandomForestClassifier(random_state=seed),
model_t=ens.RandomForestClassifier(random_state=seed),
discrete_treatment=True,
discrete_outcome=True,
random_state=seed,
)
model.fit(Y, A_, X=X_)
shap_values = model.shap_values(X_) Part of the problem with reproducing this error is that EconML doesn't currently allow you to pass a seed to the I'd be happy to add in that keyword argument into the This change isn't strictly necessary for this issue (although I believe it's generally desirable), and, even if I do find a configuration that triggers the error on my machine, I doubt it'll be reproducible across machines. Any alternative ideas? I can quickly submit a PR with just the change proposed in option 4 and we could revisit adding the |
I agree that allowing the seed to be passed as an optional argument seems beneficial, not just for testing but for reproducibility more broadly. If you don't mind adding these changes to your PR, that would be great, but if that's too much work I'm also happy to merge it as-is. |
For, e.g.,
CausalForestDML
, the desired behavior (#458) of EconML is to setcheck_additivity=False
when computingshap_values
(otherwise, the computation will frequently fail for reasons unbeknownst to me).This kwarg is set here where, if the
Explainer
class isTree
,check_additivity
is set toFalse
.This works with
shap<=0.42.1
; however, inshap>=0.43.0
the name of the class was changed toTreeExplainer
.Given that EconML states it's compatible with
shap >= 0.38.1, < 0.44.0
, this represents a bug.Some potential solutions:
shap
to 0.44.0 to 0.42.1. (This doesn't seem desirable.)(It seems that this parameter is only used in the
TreeExplainer
and theDeepExplainer
, so the signature inspection would probably be fine, and we probably want to set this kwarg toFalse
all the time regardless (at least in this package)).Happy to submit a PR with whatever approach though.
The text was updated successfully, but these errors were encountered: