How to resume training with correct learning rate #326
-
I'm training a fastpitch model from scratch on custom datasets. The initial learning rate (lr) that was set prior to training is 0.0001 i.e.,
But the learning rate that got printed at the start of training was quite different i.e., 2.5000000000000002e-08 . `�[4m�[1m > EPOCH: 0/1500�[0m �[1m > TRAINING (2025-03-03 16:52:55) �[0m �[1m --> TIME: 2025-03-03 16:53:00 -- STEP: 0/686 -- GLOBAL_STEP: 0�[0m I assume that it is happening due to the usage of warmup steps, but I'm not very sure. And later, due to Out of Memory exception, the training got stopped after 100 epochs. I would like to resume training the model with the updated learning rate (lr) that was used in 100th epoch i.e., 6.1 * e-6. However, when i tried executing this command
The learning rate that got printed in the latest epoch is again quite different i.e., 2.5000000000000002e-08. Can i get any help regarding how to resume the training with the updated learning rate? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
To continue a training run, you should use |
Beta Was this translation helpful? Give feedback.
To continue a training run, you should use
--continue_path path/
instead of--restore_path path/model.pth
, that should correctly restore parameters.