Fix wrong initialization of lr scheduler #256

kylematoba · 2024-11-29T23:09:37Z

The initialization of the learning rate scheduler does not correctly resume. This is because the optimizer state is loaded first, and the lambda in the LR scheduler is partialied to the LR this implies. In this PR I fix(?) it by first initializing both the optimizer and the LR scheduler, then load the state dict for each.

So, for example, when you save the state dict at a step you have

lr_scheduler.lr_lambdas[0]
functools.partial(<function lr_scheduler_builder.<locals>.lr_lambda at 0x7f6e1c539fc0>, initial_lr=0.0003)

but when you reload it to resume you have

lr_scheduler.lr_lambdas[0]
functools.partial(<function lr_scheduler_builder.<locals>.lr_lambda at 0x7f00c053b760>, initial_lr=1e-05)

Obviously, using a totally wrong LR schedule makes resumption useless.

It seems to work, except for the 15 step example examples/config_tiny_llama.yaml, where it's close to working, but not exact. I'm assuming this is due to some sort of warm up or something. If it's not clear to you I can look into it further.

Plus a small fix for the code just not running in serialize/main.py.

NouamaneTazi

LGTM! please rebase main and remove lr_scheduler._initial_step() as shown here #243 (comment)

kylematoba · 2024-12-16T20:00:34Z

@NouamaneTazi that's done.

Fix wrong initialization of lr scheduler

3e282c2

Lauler mentioned this pull request Dec 4, 2024

Fix initial_lr when resuming training #243

Open

NouamaneTazi requested changes Dec 5, 2024

View reviewed changes

eliebak and others added 5 commits December 16, 2024 20:33

quick fix

acfe891

style fix

9d30482

updated dependency versions specified for compatibility

f5bdc47

edited to state python version <3.12

3169f1f

rebase

ee288eb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix wrong initialization of lr scheduler #256

Fix wrong initialization of lr scheduler #256

kylematoba commented Nov 29, 2024

NouamaneTazi left a comment

kylematoba commented Dec 16, 2024

Fix wrong initialization of lr scheduler #256

Are you sure you want to change the base?

Fix wrong initialization of lr scheduler #256

Conversation

kylematoba commented Nov 29, 2024

NouamaneTazi left a comment

Choose a reason for hiding this comment

kylematoba commented Dec 16, 2024