Use a grouper instead of `unique_id` #23

david-waterworth · 2023-02-15T22:08:29Z

In the main feature extract loop, tsfeatures groups by the hard coded unique_id columns, and then applies transforms the grouped data.

tsfeatures/tsfeatures/tsfeatures.py

Line 916 in 5ce2ba7

ts_features = pool.starmap(partial_get_feats, ts.groupby('unique_id'))

It would be more generic if you could pass in a Grouper to perform the grouping, i.e. at the moment I have to group my data then create a flat column from the multi-index (i.e. a column of tuples)

# group by id and day
grouper = [pd.Grouper(key='id'), pd.Grouper(key='time', freq='1D')]
grouped_data = df.groupby(grouper, group_keys=True)

# join groups, use grouper key as new index
grouped_data = grouped_data.apply(lambda x: x.drop(columns=['id']))
grouped_data = grouped_data.droplevel(-1)

# flatten index to tuples
grouped_data.index = grouped_data.index.to_flat_index()
grouped_data.index.name = 'id'
grouped_data = grouped_data.reset_index()

The issue I've had with that is that I've been experimenting with Dask and data formats like parquet don't seem to support this column type (you can create a Dask data frame from a pandas dataframe that contains tuple columns but so far I've been unable to persist them). I know tsfeatures doesn't support Dask at this stage but I guess it might be on the roadmap?

The text was updated successfully, but these errors were encountered:

david-waterworth · 2023-02-16T00:14:17Z

Also for reasons I don't understand when I used a multi-level grouper I had create a generator.

grouper = [pd.Grouper(key='fav_id'), pd.Grouper(key='ts', freq='1D')]
groups = ((index,ts) for index,ts in df.groupby(grouper))
with Pool(threads) as pool:
    ts_features = pool.starmap(partial_get_feats, groups)
    pool.close()
    pool.join()

For some reason len(df.groupby(grouper)) is different to df.groupby(grouper).ngroups

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a grouper instead of `unique_id` #23

Use a grouper instead of `unique_id` #23

david-waterworth commented Feb 15, 2023

david-waterworth commented Feb 16, 2023

Use a grouper instead of unique_id #23

Use a grouper instead of unique_id #23

Comments

david-waterworth commented Feb 15, 2023

david-waterworth commented Feb 16, 2023

Use a grouper instead of `unique_id` #23

Use a grouper instead of `unique_id` #23