Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update moe_align kernel #125

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Update moe_align kernel #125

wants to merge 1 commit into from

Conversation

charlifu
Copy link
Contributor

@charlifu charlifu commented Feb 19, 2025

Update moe_algin kernel to the latest vllm version.

This PR is for #107

@charlifu charlifu requested a review from valarLip February 19, 2025 03:36
@yiakwy-xpu-ml-framework-team
Copy link

yiakwy-xpu-ml-framework-team commented Feb 20, 2025

@charlifu Could you kindly have a look this pending PR ? It is AMD friendly kernel, very fast.

As for copyright issue , note the original kernel from vllm contributed from SGLang authors according to MIT liencse for compilance requirements.

As far as I understand (fix me if I am wrong), SGlang team will use more and more utilities from Flashinfer. For example vec_t for different dtype vectorization (uint4 128 bit load).

The problems is vec_t does not handle unligned cases (the original design is leave to user to handle it) :

offset = load<srcDtype ,4>(src, dst, /*offset*/0);
// handling 
offset = load<srcDtype, 2>(src, dst, offset);
offset = load<srcDtyep, 1>(src, dst, offset);

Luckily there is no need to dispatch fp8/fp16/bf16 for this kind of kernel. But later we still need this capability.

Consider the status of flashinfer-rocm, I prefer to use CK vectorization capability to do this later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants