Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deal with logits from position indices in the output layer? #22

Open
xiaoda99 opened this issue Aug 24, 2018 · 1 comment
Open

Comments

@xiaoda99
Copy link

xiaoda99 commented Aug 24, 2018

Dear guys,

I found that the position embeddings are concatenated with the word embeddings in the embedding layer.

init_params[0] = np.concatenate([init_params[1], (np.random.randn(n_special, n_embd)*0.02).astype(np.float32), init_params[0]], 0)

and the output layer also shares weights with this embedding layer, so it outputs logits for both word indices and position indices.
lm_logits = tf.matmul(lm_h, we, transpose_b=True)

My questions are:

  1. During lm pretraining, did you mask out the logits from those position indices when computing the loss?
  2. If I use the pretrained model as a LM to generate text, do I need to mask out these position indices' logits before softmax when sampling the next word?

BTW, I used the pytorch code ported by huggingface:
https://github.com/huggingface/pytorch-openai-transformer-lm
FYI, I also posted an issue there describing some details of my experiments:
huggingface/pytorch-openai-transformer-lm#36

  • Da Xiao
@xiaoda99 xiaoda99 changed the title How to deal with logits from position embeddings in the output layer? How to deal with logits from position indices in the output layer? Aug 24, 2018
@madisonmay
Copy link

madisonmay commented Aug 30, 2018

@xiaoda99 in https://github.com/IndicoDataSolutions/finetune/blob/development/finetune/base.py#L544 masking the positional embeddings helped produce more reasonable generated text for us

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants