You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During lm pretraining, did you mask out the logits from those position indices when computing the loss?
If I use the pretrained model as a LM to generate text, do I need to mask out these position indices' logits before softmax when sampling the next word?
The text was updated successfully, but these errors were encountered:
xiaoda99
changed the title
How to deal with logits from position embeddings in the output layer?
How to deal with logits from position indices in the output layer?
Aug 24, 2018
Dear guys,
I found that the position embeddings are concatenated with the word embeddings in the embedding layer.
finetune-transformer-lm/train.py
Line 411 in bd1cf7d
and the output layer also shares weights with this embedding layer, so it outputs logits for both word indices and position indices.
finetune-transformer-lm/train.py
Line 176 in bd1cf7d
My questions are:
BTW, I used the pytorch code ported by huggingface:
https://github.com/huggingface/pytorch-openai-transformer-lm
FYI, I also posted an issue there describing some details of my experiments:
huggingface/pytorch-openai-transformer-lm#36
The text was updated successfully, but these errors were encountered: