Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some question about loss function #8

Open
LeeRiking opened this issue Jun 10, 2022 · 4 comments
Open

Some question about loss function #8

LeeRiking opened this issue Jun 10, 2022 · 4 comments

Comments

@LeeRiking
Copy link

In your code, batch_x is the same with batch_y. When calculating the MSE, pred and ture represent the 168 steps with shifting one step and the 168 steps with original data, so there are some question about data misalignment, and it is mean that the calculation of MSE is not alignment with original data and predition data with one step difference.
Hope your ansser sincerely!

@LeeRiking
Copy link
Author

The question is occured in single step forecast

@Zhazhan
Copy link

Zhazhan commented Jun 10, 2022

The batch_x and batch_y are not the same.
Let's denote the history length as L_H and the prediction length as L_P. In order to perform rolling prediction, on a L_H+L_P window, we fetch L_H sequences from front to back and set the end of each sequence to -1 to prevent information leakage. These sequences are stacked in the batch dimension into batch_x, and batch_y only takes a L_P sequence from the end of the original window. Therefore, there is a one-to-one correspondence prediction and batch_y.
For more details, please read the 'split' function in 'dataloader.py'.

@LeeRiking
Copy link
Author

Thank you for your explaination!When debugging the code, i have another problem on the encoder layer,the default parameter of use_tvm is false, and whether it means that training model by ordinary multi-head attention. TVM is used to compilation models.
tvm

@Zhazhan
Copy link

Zhazhan commented Jun 22, 2022

When use_tvm=False, we implement Pyraformer by adding an attention mask to the ordinary multi-head attention. Therefore, the setting of use_tvm does not affect the results, but does affect speed and memory usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants