Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The token input to the model decoder at each time step is the same as the output token #4

Open
LiuZeJie97 opened this issue Feb 1, 2023 · 1 comment

Comments

@LiuZeJie97
Copy link

LiuZeJie97 commented Feb 1, 2023

您的代码对我很有帮助,但是我发现代码可能存在如下问题:

  1. 模型解码器每个时间步输入的token和输出的token相同,因此模型准确率很高,但是正确的做法是每次输入t-1时刻的token(而不是t时刻的token),输出t时刻的token。
  2. teacher forcing仅可被用于模型的训练,但是代码在对模型进行评价时依然使用了teacher forcing。

Thanks for the code!I found that the code may have the following problems:

  1. The token input to the model decoder at each time step is the same as the output token, so the accuracy of the model is very high. But the correct input to the decoder is the token at time t-1, not the token at time t.
  2. teacher forcing can only be used for model training, but the code uses it when evaluating the model.
@alexandru-dinu
Copy link
Member

Thank you for spotting this! If you wish and have the time, feel free to open a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants