Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use LinearDML and causalforestDML to get CATE #926

Open
GladysKao opened this issue Nov 5, 2024 · 5 comments
Open

How to use LinearDML and causalforestDML to get CATE #926

GladysKao opened this issue Nov 5, 2024 · 5 comments

Comments

@GladysKao
Copy link

Hi, I'm new to this.
I want to know if the coef_ I have got in LinnearDML is CATE? If not, what these coef_ mean?
where can I get CATE when I use causalforestDML?
I'm so confused...I have read doc for many times but I still can't get CATE values.
In my opinion, I think CATE is the effect when X_i=1 and other features all equal to 1 or 0....is anything wrong about my definition about CATE?
Thank you~

(or if I use marginal_effect['CATE']=est.marginal_effect(T, X), and then marginal_effect.groupby(X_i)['CATE'].mean(), can I get CATE through this way?)

@kbattocchi
Copy link
Collaborator

In general, the CATE is the Conditional Average Treatment Effect, of going from some treatment T0 to some other treatment T1, given some set of features X.

For all of our estimators, this can be gotten by calling est.effect(X, T0, T1). In the case of LinearDML, this is derived from coef_ (and intercept_, if fit_cate_intercept is True), but they are not identical (the coef_ entries are the coeffcients of the CATE on the feature/treatment interactions).

Most of our estimators are linear in the treatment, and often there is a single treatment, in which case the marginal effect is just a scalar function of X independent of T, and so est.const_marginal_effect(X) will provide you with the marginal effect (basically, the effect of going from treatment=0 to treatment=1, which you can then scale to other treatment differences accordingly).

@GladysKao
Copy link
Author

Thank you for your reply.
I think I have a misunderstanding about CATE. I misunderstood CATE is for just ONE feature not a set of features. But for that I met a new problem, if I have five features, and then I want to obtain ATE for each feature, what can I do? Does TreeCateInterpreter will help?
My code is :
results_cf = X.copy() cate_cf=est_cf.effect(X) results_cf['CATE']=cate_cf for x in est_lr.cate_feature_names(): average_cate_lr = results_lr.groupby(x)['CATE'].mean() print(average_cate_lr)
The output is :
... peak 0 0.252617 1 0.103567 Name: CATE, dtype: float64 ...

and then I called SingleTreeCateInterpreter:
微信图片_20241107182336

I found CATE of the feature which be uesd to spilt first is the same as my output. so I really want to know if my output "average_cate_lr" has its owen meaning. or I just want to know how to interpret each CATE in each node. Does it represent the causal effect between T and Y when condition is X_i or a combination of X_i, ... ,X_j.

I mean, if I have five features: passenger, violate, distract, weekend and peak. I got a treeinterpreter and CATE for each node, assuming that the split step is "peak weekend distract....."
Can I say"when peak, CATE is 0.103567, when not peak ,CATE is 0.252617, when peak and weekend, CATE is..., when peak and not weekend, CATE is..., When not peak and distract...."
I want to know the meaning of each CATE, I think each CATE has its own meaning.
Look forward to your reply, Thank you~

@GladysKao
Copy link
Author

CORRECTION:
results_cf = X.copy() cate_cf=est_cf.effect(X) results_cf['CATE']=cate_cf for x in est_cf.cate_feature_names(): average_cate_cf = results_cf.groupby(x)['CATE'].mean() print(average_cate_cf)
Sorry for my mistake, the model I used is causalforestDML

@GladysKao
Copy link
Author

Or shortly, @kbattocchi How can I understand the results of treeinterpreter.

And I even can use treeinterpreter for LinearDML, I know GRF create the split structure by maximizing the heterogeneity for each spilt. But how do LinearDML create the split structure?

Big thanks in advance.

@kbattocchi
Copy link
Collaborator

If you have multiple columns in X, the CATE is giving you an estimate of the effect of T on Y conditional on all of the features in X simultaneously. In general, for CausalForestDML, this will be some complicated function of X, while for LinearDML this will just be a linear function of X (which coef_ gives you the coefficients for).

SingleTreeCateInterpreter is one way to try to get a simplified view of any learned CATE model - it will give you a tree that has just a handful of nodes, making it easy to interpret, but the tradeoff is that it will give you rougher estimates of the CATE because it averages together units that the underlying model would assign different individual CATEs. However, this tradeoff might be worthwhile if you need a very high level understanding of which units have very different effects from each other, rather than more precise estimates for each unit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants