Use master weights for bfloat16 FusedAdam when master_weights=True #1731

cbcase · 2023-09-22T17:00:48Z

As mentioned in #1728, the FusedAdam optimizer ignores master_weights=True for bfloat16 parameters. This PR fixes that oversight. I have confirmed that the behavior now matches a "by hand" implementation of master weights (hand-copying) along with vanilla torch.optim.AdamW on the fp32 copy.

cbcase · 2023-10-16T23:51:44Z

Ping @minitu, looks like you added this support originally -- could you take a look? Thanks

minitu · 2023-10-17T17:28:04Z

LGTM, we only looked at adding master weights for FP16 AMP at the time of the original PR.
@crcrpar Could you review this as well?

crcrpar

looks good but could you add a test case of bfloat16 model with fp32 weights to

apex/tests/L0/run_optimizers/test_adam.py

Line 151 in 19cc873

def testGradScalerCapturableMaster(self):

?

Use master weights for bfloat16 FusedAdam when master_weights=True

db567bb

crcrpar reviewed Oct 18, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use master weights for bfloat16 FusedAdam when master_weights=True #1731

Use master weights for bfloat16 FusedAdam when master_weights=True #1731

cbcase commented Sep 22, 2023

cbcase commented Oct 16, 2023

minitu commented Oct 17, 2023

crcrpar left a comment

Use master weights for bfloat16 FusedAdam when master_weights=True #1731

Are you sure you want to change the base?

Use master weights for bfloat16 FusedAdam when master_weights=True #1731

Conversation

cbcase commented Sep 22, 2023

cbcase commented Oct 16, 2023

minitu commented Oct 17, 2023

crcrpar left a comment

Choose a reason for hiding this comment