You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am estimating the effect of high levels of particulate matter (PM2.5) on excess deaths from panel data for 25 municipalities with daily resolution. It means my treatment is a binary variable where T=1, when the level of PM2.5 is high, and T=0, when the level of PM2.5 is low. The outcome is also a binary variable, where Y=0 for non-excess deaths, and Y=1 for excess deaths.
I am using the class DynamicDML to fit my model, but I get this error message: "AttributeError: Provided crossfit folds contain training splits that don't contain all treatments". But, 50% of the data corresponds to observations with T=1, I think it is enough to obtain balanced crossfit folds.
Here is my code with econml version 0.15 and dowhy version 0.10.1 dataset_pm_deaths.csv
`
import dowhy
import econml
from dowhy import CausalModel
import pandas as pd
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LassoCV
import scipy.stats as stats
from itertools import product
from econml.utilities import WeightedModelWrapper
from sklearn.model_selection import train_test_split
from econml.panel.dml import DynamicDML
data_all = pd.read_csv("D:/dataset_pm_deaths.csv")
data = data_all[data_all['Year'] >= 2009]
Hi,
I am estimating the effect of high levels of particulate matter (PM2.5) on excess deaths from panel data for 25 municipalities with daily resolution. It means my treatment is a binary variable where T=1, when the level of PM2.5 is high, and T=0, when the level of PM2.5 is low. The outcome is also a binary variable, where Y=0 for non-excess deaths, and Y=1 for excess deaths.
I am using the class DynamicDML to fit my model, but I get this error message: "AttributeError: Provided crossfit folds contain training splits that don't contain all treatments". But, 50% of the data corresponds to observations with T=1, I think it is enough to obtain balanced crossfit folds.
Here is my code with econml version 0.15 and dowhy version 0.10.1
dataset_pm_deaths.csv
`
import dowhy
import econml
from dowhy import CausalModel
import pandas as pd
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LassoCV
import scipy.stats as stats
from itertools import product
from econml.utilities import WeightedModelWrapper
from sklearn.model_selection import train_test_split
from econml.panel.dml import DynamicDML
data_all = pd.read_csv("D:/dataset_pm_deaths.csv")
data = data_all[data_all['Year'] >= 2009]
median_pm25 = data['PM25'].median()
data['PM25'] = (data['PM25'] >= median_pm25).astype(int)
data.BC = stats.zscore(data.BC, nan_policy='omit')
data.DMS = stats.zscore(data.DMS, nan_policy='omit')
data.PM = stats.zscore(data.PM, nan_policy='omit')
data.OC = stats.zscore(data.OC, nan_policy='omit')
data.SO2 = stats.zscore(data.SO2, nan_policy='omit')
data.SO4 = stats.zscore(data.SO4, nan_policy='omit')
data0 = data[['excess', 'PM25', 'cod_munici',
'BC', 'DMS', 'PM', 'OC', 'SO2', 'SO4', 'Temperature', 'lead1_PM25']]
data0 = data0.dropna()
Y = data0.excess.to_numpy()
T = data0.PM25.to_numpy()
percentage_high_PM25 = np.mean(T == 1) * 100
W = data0[['BC', 'DMS', 'PM', 'OC', 'SO2', 'SO4', 'Temperature']].to_numpy().reshape(-1, 7)
X = data0[['Temperature', 'lead1_PM25']].to_numpy().reshape(-1, 2)
groups = data0.cod_munici.to_numpy()
estimate0 = DynamicDML(discrete_treatment=True,
featurizer=PolynomialFeatures(degree=3),
linear_first_stages=False, cv=3, random_state=123)
estimate0.fit(Y=Y, T=T, X=X, W=W, inference='auto', groups=groups) # HERE IS THE ERROR
`
The text was updated successfully, but these errors were encountered: