-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When using the pre-trained MT-DNN ckpt, the training loss does not converge #212
Comments
Learning rate is too big? |
It seems like a random seed problem. If default random seed was chosen, the training loss increasing problem appeared(even with a smaller learning rate). Choose another random seed is useful for this problem. |
Another issuse: Line 398 in 471f717
This line seems will overwrite the training params when using a pre-trained mt-dnn model. |
@heshenghuan , I reran the script with different random seeds and didnot hit the bug as your mentioned. Yes, config in mt-dnn should be the same as the pretrained config. If I remember correctly, I removed other unrelated args. |
The init_checkpoint is trained by using
scripts/run_mt_dnn.sh
, and only the random seed changed, the train loss:The text was updated successfully, but these errors were encountered: