-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss goes to -inf #1
Comments
Thank you for your attention. I haven't encountered this issue during my runs, so it would be helpful if you could share more details. Could you please provide the training logs and any specific settings or modifications you made? This will help me diagnose the problem more accurately. |
can you share a screenshot of your loss trend? for me the training loss curve is not stable @JingyangOu I am using 128 tokens, with 10k vocabulary size, and a model with 130M paramters. |
How large is your batch size? 512x16? for me, my bs is only 32 |
The batch size in the config refers to the equivalent batch size after combining all GPUs and applying gradient accumulation. Therefore, the batch size I used is 512. I also tried training with a batch size of 32, and in this case, the resulting curve was similar to the one with a batch size of 512, with no signs of training instability. |
Hello,
I am trying to run your code to reproduce your results, but when using either the
lambda_DCE
ort_DCE
loss, the loss quickly goes to -infinity. As such, I am wondering if there is a sign missing somewhere. Could you have a look at the code? Simply negating the loss fromget_loss_fn
does not solve the issue.Thanks in advance.
The text was updated successfully, but these errors were encountered: