Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About layer norm dimention parameter: #113

Open
vcvycy opened this issue Jun 7, 2023 · 1 comment
Open

About layer norm dimention parameter: #113

vcvycy opened this issue Jun 7, 2023 · 1 comment

Comments

@vcvycy
Copy link

vcvycy commented Jun 7, 2023

In model.py, the implementation of layer norm is :
self.ln_1 = nn.LayerNorm(config.n_embd)
If batch_size = 64, block_size = 6, embedding_size = 48,
then the shape of input is [64, 6, 48], and the layer norm parameter is [48],

But in my opinion, I think the layer norm dimention should be [6, 48], do I wrong?

@rjarun8
Copy link

rjarun8 commented Jun 28, 2023

On a model like GPT, we should be doing normalization on the vectors that correspond to the individual tokens within the blocks/sequence. Because we want to normalize the vectors for the tokens and center the vector within the scope of that token.

What we don't want to be doing is normalizing it across (blocks, and tokens) as this would center the entire sequence. This will mean each token will lose its individual characteristics and importance and all the words will be treated as the same or similar words.

This may make the attention mechanism go blurry and give out gibberish hallucinating results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant