Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Misc]: 特性梳理 #3

Open
6 of 18 tasks
wangshuai09 opened this issue Oct 25, 2024 · 2 comments
Open
6 of 18 tasks

[Misc]: 特性梳理 #3

wangshuai09 opened this issue Oct 25, 2024 · 2 comments
Labels

Comments

@wangshuai09
Copy link
Owner

wangshuai09 commented Oct 25, 2024

特性 分析 适配情况
PageAttention decoding 阶段 attention
FlashAttention prefilling 阶段 attention
TP tensor多卡并行
Multi-Host tensor-parallel 多节点tp并行,需适配vllm ray_executor
FlashDecoding attention算子部分场景优化,torch_npu中page attention接口已支持,未适配,flash attention接口未支持
MoE 混合专家模型,需适配两个算子
Compilation cache 编译缓存,依赖torch.compiler
Prefix caching 需slot_indices部分适配
Vision language model 需逐模型适配,部分支持
Speculative decoding 投机掩码
Chunked prefills 需slot_indices部分调试
Lora 需实现6个triton算子
MultiLora 需实现6个triton算子
Int8/AWQ/GPTQ量化 需实现量化相关算子
BeamSearch 优化的搜索方法
Custom_op融合算子 需实现acend算子或调用torch_npu相关算子
310p适配 以上已支持特性均为910b环境测试并适配,310p适配需调试torch_npu attention接口,后续特性支持需同步适配310p

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@Shenggao190
Copy link

你好,我想问一下,量化相关部分还需要多久可以适配?

Copy link

github-actions bot commented Mar 6, 2025

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

@github-actions github-actions bot added the stale label Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants