You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be more generic if you could pass in a Grouper to perform the grouping, i.e. at the moment I have to group my data then create a flat column from the multi-index (i.e. a column of tuples)
# group by id and day
grouper = [pd.Grouper(key='id'), pd.Grouper(key='time', freq='1D')]
grouped_data = df.groupby(grouper, group_keys=True)
# join groups, use grouper key as new index
grouped_data = grouped_data.apply(lambda x: x.drop(columns=['id']))
grouped_data = grouped_data.droplevel(-1)
# flatten index to tuples
grouped_data.index = grouped_data.index.to_flat_index()
grouped_data.index.name = 'id'
grouped_data = grouped_data.reset_index()
The issue I've had with that is that I've been experimenting with Dask and data formats like parquet don't seem to support this column type (you can create a Dask data frame from a pandas dataframe that contains tuple columns but so far I've been unable to persist them). I know tsfeatures doesn't support Dask at this stage but I guess it might be on the roadmap?
The text was updated successfully, but these errors were encountered:
Also for reasons I don't understand when I used a multi-level grouper I had create a generator.
grouper = [pd.Grouper(key='fav_id'), pd.Grouper(key='ts', freq='1D')]
groups = ((index,ts) for index,ts in df.groupby(grouper))
with Pool(threads) as pool:
ts_features = pool.starmap(partial_get_feats, groups)
pool.close()
pool.join()
For some reason len(df.groupby(grouper)) is different to df.groupby(grouper).ngroups
In the main feature extract loop, tsfeatures groups by the hard coded
unique_id
columns, and then applies transforms the grouped data.tsfeatures/tsfeatures/tsfeatures.py
Line 916 in 5ce2ba7
It would be more generic if you could pass in a
Grouper
to perform the grouping, i.e. at the moment I have to group my data then create a flat column from the multi-index (i.e. a column of tuples)The issue I've had with that is that I've been experimenting with Dask and data formats like parquet don't seem to support this column type (you can create a Dask data frame from a pandas dataframe that contains tuple columns but so far I've been unable to persist them). I know tsfeatures doesn't support Dask at this stage but I guess it might be on the roadmap?
The text was updated successfully, but these errors were encountered: