AIMV2 as the encoder, unfreezing it and setting the learning rate to 1e-6 results in the LLaVA model achieving a loss of 0，grad_norm of NAN. #30

shengyuwoo · 2025-01-07T05:57:20Z

When using AIMV2 as the encoder, unfreezing it and setting the learning rate to 1e-6 leads to the LLaVA model reaching a loss of 0 after 5000 steps. The original paper kept the encoder frozen. Why is it not recommended to unfreeze it for training? If I decide to unfreeze it, What should I do?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AIMV2 as the encoder, unfreezing it and setting the learning rate to 1e-6 results in the LLaVA model achieving a loss of 0，grad_norm of NAN. #30

AIMV2 as the encoder, unfreezing it and setting the learning rate to 1e-6 results in the LLaVA model achieving a loss of 0，grad_norm of NAN. #30

shengyuwoo commented Jan 7, 2025

AIMV2 as the encoder, unfreezing it and setting the learning rate to 1e-6 results in the LLaVA model achieving a loss of 0，grad_norm of NAN. #30

AIMV2 as the encoder, unfreezing it and setting the learning rate to 1e-6 results in the LLaVA model achieving a loss of 0，grad_norm of NAN. #30

Comments

shengyuwoo commented Jan 7, 2025