You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
right now our implementation of RoPE assumes the rotation matrix is created and used in the HuggingFace model code way, i.e. instead of the rotation matrix described in original RoPE paper https://arxiv.org/pdf/2104.09864, we assume it looks something like this instead:
🚀 The feature, motivation and pitch
right now our implementation of RoPE assumes the rotation matrix is created and used in the HuggingFace model code way, i.e. instead of the rotation matrix described in original RoPE paper https://arxiv.org/pdf/2104.09864, we assume it looks something like this instead:
We should also support use cases where people create their RoPE cos & sin buffers following the original formula.
Alternatives
We may need to consider the complex form too (i.e. what official meta llama code is doing https://github.com/meta-llama/llama/blob/6c7fe276574e78057f917549435a2554000a876d/llama/model.py#L64-L74)
Additional context
No response
The text was updated successfully, but these errors were encountered: