Skip to content

Commit

Permalink
increase dropout value for gpt 126m
Browse files Browse the repository at this point in the history
Signed-off-by: dimapihtar <[email protected]>
  • Loading branch information
dimapihtar committed Aug 30, 2024
1 parent 48f0049 commit 884065c
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions launcher_scripts/conf/training/gpt3/126m.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,8 @@ model:
ffn_hidden_size: ${multiply:4, ${.hidden_size}} # Transformer FFN hidden size. 4 * hidden_size.
num_attention_heads: 12
init_method_std: 0.023 # Standard deviation of the zero mean normal distribution used for weight initialization.')
hidden_dropout: 0.1 # Dropout probability for hidden state transformer.
attention_dropout: 0.1 # Dropout probability for attention
hidden_dropout: 0.2 # Dropout probability for hidden state transformer.
attention_dropout: 0.2 # Dropout probability for attention
kv_channels: null # Projection weights dimension in multi-head attention. Set to hidden_size // num_attention_heads if null
apply_query_key_layer_scaling: True # scale Q * K^T by 1 / layer-number.
layernorm_epsilon: 1e-5
Expand Down

0 comments on commit 884065c

Please sign in to comment.