You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In model.py, the implementation of layer norm is :
self.ln_1 = nn.LayerNorm(config.n_embd)
If batch_size = 64, block_size = 6, embedding_size = 48,
then the shape of input is [64, 6, 48], and the layer norm parameter is [48],
But in my opinion, I think the layer norm dimention should be [6, 48], do I wrong?
The text was updated successfully, but these errors were encountered:
On a model like GPT, we should be doing normalization on the vectors that correspond to the individual tokens within the blocks/sequence. Because we want to normalize the vectors for the tokens and center the vector within the scope of that token.
What we don't want to be doing is normalizing it across (blocks, and tokens) as this would center the entire sequence. This will mean each token will lose its individual characteristics and importance and all the words will be treated as the same or similar words.
This may make the attention mechanism go blurry and give out gibberish hallucinating results.
In model.py, the implementation of layer norm is :
self.ln_1 = nn.LayerNorm(config.n_embd)
If batch_size = 64, block_size = 6, embedding_size = 48,
then the shape of input is [64, 6, 48], and the layer norm parameter is [48],
But in my opinion, I think the layer norm dimention should be [6, 48], do I wrong?
The text was updated successfully, but these errors were encountered: