Skip to content
This repository has been archived by the owner on Jul 20, 2024. It is now read-only.

模型量化 #5

Open
AACengineer opened this issue Apr 2, 2024 · 1 comment
Open

模型量化 #5

AACengineer opened this issue Apr 2, 2024 · 1 comment
Assignees

Comments

@AACengineer
Copy link

你们在rdk x3上部署的模型,是怎样量化的?能够做到全部8bit量化吗(据我了解,llm很难做到全8bit量化,例如RMSNorm、RoPE这块基本上还是保持浮点计算)?

@zixi01chen
Copy link
Contributor

您好,没有做到全量化,目前没有资料释放,后续会根据情况开放相关信息。大致的思路是:

  1. 部分难量化的算子会退到cpu计算。
  2. 量化的模型,需要再进行qat微调,保证整体量化效果。

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants