Q: so for App flow dataset, the only feature is time? #25

mw66 · 2023-04-14T05:02:51Z

https://github.com/ant-research/Pyraformer/blob/master/preprocess_flow.py#L19-L22

extract: time, weekday, hour, month

and is used here:

https://github.com/ant-research/Pyraformer/blob/master/preprocess_flow.py#L54-L57

I'm just wondering:

why, for example, not using zone (convert to some integer) as extra features, and in that case, how does this model perform?
or: if the train data only contains the single time feature (without weekday, hour, month), will this model still perform?

Sorry for the silly questions, want to hear your insight.

Thanks.

Zhazhan · 2023-04-14T06:29:33Z

Hi,

The information of 'zone' and 'app_name' is actually used, see https://github.com/ant-research/Pyraformer/blob/master/preprocess_flow.py#L13 and https://github.com/ant-research/Pyraformer/blob/master/preprocess_flow.py#L57. Each 'app_name' in each 'zone' corresponds to a time series, so we convert the 'app_name' and 'zone' information into an integer, namely, the 'seq_id'.
It is also possible to make predictions based solely on historical time series. Following previous works, our implementation introduced these covariates.

mw66 · 2023-04-15T18:32:37Z

Ok, so the app_name and zone are there, but how about the previous value of the raw input sequence (inside the window size)?

Let's check the raw input sequence data, in:
https://github.com/ant-research/Pyraformer/blob/master/preprocess_flow.py#L17-L26

        single_df = grouped_data[i][1].drop(labels=['app_name', 'zone'], axis=1).sort_values(by="time", ascending=True)
        times = pd.to_datetime(single_df.time)
        single_df['weekday'] = times.dt.dayofweek / 6
        single_df['hour'] = times.dt.hour / 23
        single_df['month'] = times.dt.month / 12
        temp_data = single_df.values[:, 1:]    # L22, 'time' column is dropped here
        if (temp_data[:, 0] == 0).sum() / len(temp_data) > 0.2:
            continue

        all_data.append(temp_data)

we can see temp_data[:, 0] is the raw input sequence ('app_name', 'zone' are dropped on L17, and 'time' is dropped on L22, so temp_data[:, 0] is the 'value' in the original csv file.

Then, in
https://github.com/ant-research/Pyraformer/blob/master/preprocess_flow.py#L55

  single_data[:, 0] = seq_data.copy()

is the real raw input sequence data,

but in https://github.com/ant-research/Pyraformer/blob/master/data_loader.py#L513-L518

        cov = all_data[:, :, 1:]   # the real raw input sequence data 'value' (all_data[:, :, 0]) dropped?

        split_start = len(label[0]) - self.pred_length + 1
        data, label = split(split_start, label, cov, self.pred_length)

        return data, label

it's dropped from the training data?

That's my question: so the previous value of the raw input sequence value is not used at all in training?

mw66 mentioned this issue Apr 15, 2023

some questions about preprocess_elect.py and data_loader.py #26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q: so for App flow dataset, the only feature is time? #25

Q: so for App flow dataset, the only feature is time? #25

mw66 commented Apr 14, 2023 •

edited

Loading

Zhazhan commented Apr 14, 2023

mw66 commented Apr 15, 2023 •

edited

Loading

Q: so for App flow dataset, the only feature is time? #25

Q: so for App flow dataset, the only feature is time? #25

Comments

mw66 commented Apr 14, 2023 • edited Loading

Zhazhan commented Apr 14, 2023

mw66 commented Apr 15, 2023 • edited Loading

mw66 commented Apr 14, 2023 •

edited

Loading

mw66 commented Apr 15, 2023 •

edited

Loading