Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some questions about preprocess_elect.py and data_loader.py #26

Open
mw66 opened this issue Apr 15, 2023 · 2 comments
Open

some questions about preprocess_elect.py and data_loader.py #26

mw66 opened this issue Apr 15, 2023 · 2 comments

Comments

@mw66
Copy link

mw66 commented Apr 15, 2023

Hi,

I have a few questions about preprocess_elect.py:

  1. in prep_data(): v_input[:, 1] is never used (read or write), so why you need this 2nd column?
    https://github.com/ant-research/Pyraformer/blob/master/preprocess_elect.py#L35

  2. about x_input:
    https://github.com/ant-research/Pyraformer/blob/master/preprocess_elect.py#L58
    x_input[count, 1:, 0] from 1 onward, x_input contains the real raw input data, but x_input[count, 0, 0] is never assigned, so it will remain all 0s, which means it does not contain any real raw input data
    (https://github.com/ant-research/Pyraformer/blob/master/preprocess_elect.py#L67 this line for x_input[count, 0, 0] is also zero)
    why don't you just drop all such x_input[:, 0, :], since they are the wrong training data? and why you want to save it in the final train npy file?
    i.e. change https://github.com/ant-research/Pyraformer/blob/master/preprocess_elect.py#L72-L74 to

    np.save(prefix+'data_'+save_name, x_input[:, 1:, :])
    np.save(prefix+'v_'+save_name, v_input[1:, :])
    np.save(prefix+'label_'+save_name, label[1:, :])

and I did some inspection of the saved train data, it's confirmed that they are all 0s:

>>> import numpy as np
>>> t = np.load("data/elect/train_data_elect.npy")
>>> np.max(t[:, 0, 0])
0.0
>>> np.min(t[:, 0, 0])
0.0
>>>
@mw66
Copy link
Author

mw66 commented Apr 15, 2023

@Zhazhan

my 3rd question:

https://github.com/ant-research/Pyraformer/blob/master/preprocess_elect.py#L58

            x_input[count, 1:, 0] = data[window_start:window_end-1, series]

so, the x_input[:, :, 0] is the raw input sequence data,

but in:
https://github.com/ant-research/Pyraformer/blob/master/data_loader.py#L440-L445

        cov = all_data[:, :, 2:]   # the raw input sequence data is dropped here?

        split_start = len(label[0]) - self.pred_length + 1
        data, label = split(split_start, label, cov, self.pred_length)

        return data, label

it's dropped from the training data?

This is the same question I have here: #25 (comment)

So the previous value of the raw input sequence value is not used at all in training?

@mw66
Copy link
Author

mw66 commented Apr 16, 2023

ok, for my question 3), I found:

https://github.com/ant-research/Pyraformer/blob/master/data_loader.py#L443

        data, label = split(split_start, label, cov, self.pred_length)

which on
https://github.com/ant-research/Pyraformer/blob/master/data_loader.py#L398-L403

            single_data = batch_label[i:(split_start+i)].clone().unsqueeze(1)
            single_data[-1] = -1
            single_cov = cov[batch_idx, i:(split_start+i), :].clone()
            temp_data = [single_data, single_cov]
            single_data = torch.cat(temp_data, dim=1)
            all_data.append(single_data)

insert the label (as previous values in the window) back into the all_data. This is confusing, why you choose to do it this way?

Also, the implementation of electTrainDataset.__getitem__
https://github.com/ant-research/Pyraformer/blob/master/data_loader.py#L432

is so different from electTestDataset.__getitem__
https://github.com/ant-research/Pyraformer/blob/master/data_loader.py#L460

in particular
https://github.com/ant-research/Pyraformer/blob/master/data_loader.py#L473-L477

            single_data = data[i:(split_start+i)].clone().unsqueeze(1)
            single_data[-1] = -1
            single_cov = cov[i:(split_start+i), :].clone()
            single_data = torch.cat([single_data, single_cov], dim=1)
            all_data.append(single_data)

Here, you didn't do the same to insert the label (as previous values in the window) back into all_data, why there is such difference?

@mw66 mw66 changed the title some questions about preprocess_elect.py some questions about preprocess_elect.py and data_loader.py Apr 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant