Update moe_align kernel #125

charlifu · 2025-02-19T03:36:18Z

Update moe_algin kernel to the latest vllm version.

This PR is for #107

yiakwy-xpu-ml-framework-team · 2025-02-20T14:40:20Z

@charlifu Could you kindly have a look this pending PR ? It is AMD friendly kernel, very fast.

As for copyright issue , note the original kernel from vllm contributed from SGLang authors according to MIT liencse for compilance requirements.

As far as I understand (fix me if I am wrong), SGlang team will use more and more utilities from Flashinfer. For example vec_t for different dtype vectorization (uint4 128 bit load).

The problems is vec_t does not handle unligned cases (the original design is leave to user to handle it) :

offset = load<srcDtype ,4>(src, dst, /*offset*/0);
// handling 
offset = load<srcDtype, 2>(src, dst, offset);
offset = load<srcDtyep, 1>(src, dst, offset);

Luckily there is no need to dispatch fp8/fp16/bf16 for this kind of kernel. But later we still need this capability.

Consider the status of flashinfer-rocm, I prefer to use CK vectorization capability to do this later.

update moe_align kernel

c5e4725

charlifu requested a review from valarLip February 19, 2025 03:36

valarLip approved these changes Feb 19, 2025

View reviewed changes

valarLip mentioned this pull request Feb 20, 2025

[Feature]: enable multi-blocks execution for moe align kernel #107

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update moe_align kernel #125

Update moe_align kernel #125

charlifu commented Feb 19, 2025 •

edited

Loading

yiakwy-xpu-ml-framework-team commented Feb 20, 2025 •

edited

Loading

Update moe_align kernel #125

Are you sure you want to change the base?

Update moe_align kernel #125

Conversation

charlifu commented Feb 19, 2025 • edited Loading

yiakwy-xpu-ml-framework-team commented Feb 20, 2025 • edited Loading

charlifu commented Feb 19, 2025 •

edited

Loading

yiakwy-xpu-ml-framework-team commented Feb 20, 2025 •

edited

Loading