Fix wrong initialization of lr scheduler #256
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The initialization of the learning rate scheduler does not correctly resume. This is because the optimizer state is loaded first, and the lambda in the LR scheduler is partialied to the LR this implies. In this PR I fix(?) it by first initializing both the optimizer and the LR scheduler, then load the state dict for each.
So, for example, when you save the state dict at a step you have
but when you reload it to resume you have
Obviously, using a totally wrong LR schedule makes resumption useless.
It seems to work, except for the 15 step example
examples/config_tiny_llama.yaml
, where it's close to working, but not exact. I'm assuming this is due to some sort of warm up or something. If it's not clear to you I can look into it further.Plus a small fix for the code just not running in
serialize/main.py
.