Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why Shape of Y in Causal Forest notebook is 1000*1000 #888

Open
silulyu opened this issue Jun 5, 2024 · 2 comments
Open

Why Shape of Y in Causal Forest notebook is 1000*1000 #888

silulyu opened this issue Jun 5, 2024 · 2 comments

Comments

@silulyu
Copy link

silulyu commented Jun 5, 2024

I was running "Example Usage with Binary Treatment Synthetic Data" in the Causal Forest Notebook (https://github.com/py-why/EconML/blob/main/notebooks/Causal%20Forest%20and%20Orthogonal%20Random%20Forest%20Examples.ipynb). After running the following code, I found the shape of Y is weird. It is a 10001000 matrix while 2nd to 999th columns are all the same. I believe that we should reshape the Y matrix and only use the first column for modeling, however, that will make ATE results totally different (0.97 vs 3.1). How should I understand the shape of Y. Should it be a 10001000 matrix or 1000*1 matrix? Thank you!

# DGP constants
np.random.seed(1234)
n = 1000
n_w = 30
support_size = 5
n_x = 1
# Outcome support
support_Y = np.random.choice(range(n_w), size=support_size, replace=False)
coefs_Y = np.random.uniform(0, 1, size=support_size)
epsilon_sample = lambda n: np.random.uniform(-1, 1, size=n)
# Treatment support
support_T = support_Y
coefs_T = np.random.uniform(0, 1, size=support_size)
eta_sample = lambda n: np.random.uniform(-1, 1, size=n) 

# Generate controls, covariates, treatments and outcomes
W = np.random.normal(0, 1, size=(n, n_w))
X = np.random.uniform(0, 1, size=(n, n_x))
# Heterogeneous treatment effects
TE = np.array([exp_te(x_i) for x_i in X])
# Define treatment
log_odds = np.dot(W[:, support_T], coefs_T) + eta_sample(n)
T_sigmoid = 1/(1 + np.exp(-log_odds))
T = np.array([np.random.binomial(1, p) for p in T_sigmoid])
# Define the outcome
Y = TE * T + np.dot(W[:, support_Y], coefs_Y) + epsilon_sample(n)

# ORF parameters and test data
subsample_ratio = 0.4
X_test = np.array(list(product(np.arange(0, 1, 0.01), repeat=n_x)))

Screenshot 2024-06-05 at 12 46 03 PM
@silulyu silulyu changed the title Shape of Y in Causal Forest notebook is weird Why Shape of Y in Causal Forest notebook is 1000*1000 Jun 5, 2024
@kbattocchi
Copy link
Collaborator

I'm unable to reproduce this - I see (1000,) as the shape of Y. Is it possible that you've redefined exp_te to return something other than a scalar? What is the shape of TE?

@silulyu
Copy link
Author

silulyu commented Jun 12, 2024

Thanks so much for your reply. That is a great catch! I add the function of exp_te in the Section 2 as the same function of that in Section 1, and the shape is (1000,) now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants