Add embedding scaling #34980

literid · 2024-11-27T22:28:18Z

What does this PR do?

This PR introduces support for embedding scaling in the MistralModel. The feature is inspired by the Scaled Embed method, which demonstrates that applying a scaling factor to the embeddings significantly improves the stability of large language model (LLM) training, effectively mitigating gradient spikes.

Key Changes:

Adds a new configuration parameter:
embedding_scale (float, optional, defaults to 1.0): A scaling factor applied to the model's embeddings.
Updates the MistralModel implementation to apply the scaling factor to the embeddings during training and inference.

This implementation currently supports the PyTorch backend. Support for TensorFlow and Flax backends can be added in the future.

Motivation

The Scaled Embed method improves training stability and helps mitigate gradient spikes, as shown in the referenced paper. By implementing this feature, we aim to bring these benefits to the MistralModel while maintaining backward compatibility.

Open Questions for Discussion

Relevance:
Do you see this feature as relevant for integration into the library? Would it make sense to extend this functionality to other models, such as LlamaModel?
Implementation Scope:
Should embedding scaling be integrated more broadly across models in the library or should it remain model-specific ?

Let me know if further adjustments or refinements are needed!

ruidazeng

This makes sense to me.

Rocketknight1 · 2024-11-28T13:58:20Z

Hi @literid, this is an interesting paper! However, I'm not sure it makes sense to add it to an existing model class that wasn't trained with it - it's very likely that everyone will just leave it at 1.0.

I think we should only add this feature when a model was trained with it, to ensure it'll actually get used. This might change in future if this becomes a very widespread and popular method - then we might go back and enable it for every model.

Add embedding scaling

de3e3c0

ruidazeng approved these changes Nov 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add embedding scaling #34980

Add embedding scaling #34980

literid commented Nov 27, 2024

ruidazeng left a comment

Rocketknight1 commented Nov 28, 2024

Add embedding scaling #34980

Are you sure you want to change the base?

Add embedding scaling #34980

Conversation

literid commented Nov 27, 2024

What does this PR do?

Key Changes:

Motivation

Open Questions for Discussion

ruidazeng left a comment

Choose a reason for hiding this comment

Rocketknight1 commented Nov 28, 2024