Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Variants of RoPE #176

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open

Conversation

ruanjm
Copy link
Contributor

@ruanjm ruanjm commented Mar 5, 2025

ATT. Includes:

  • Rotate style: Supports both NEOX and GPT-J style. Backward is also supported by both styles.
  • The size in hidden dim of freqs/sin/cos can be half of that of input/output tensor.
  • Nope first: Rotate latter half of tensor.
  • 2 channels: handle two inputs at once.
  • Inplace mode: Input and output is the same. Some specific optimization is also designed for this mode.
  • Get rotate angle or cos/sin via positions and offsets.

Compared with legacy implementation, the average latency in fp16 and bf16 is reduced to about 73.5% (from 37.1% in best cases to 142.5% in worst cases). Meanwhile, more functions are supported.

ruanjm added 30 commits March 5, 2025 08:27
fix script bug
@ruanjm ruanjm force-pushed the amd/dev/jruan/rope_support_vllm branch from 5a560f6 to d04b656 Compare March 5, 2025 08:27
@ruanjm ruanjm requested a review from valarLip March 5, 2025 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant