Neural Speed supports the following models:
Model Name | INT8 | INT4 | Transformer Version | ||||
---|---|---|---|---|---|---|---|
RTN | GPTQ | AWQ | RTN | GPTQ | AWQ | ||
LLaMA2-7B, LLaMA2-13B, LLaMA2-70B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest |
LLaMA-7B, LLaMA-13B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest |
Solar-10.7B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest |
GPT-J-6B | ✅ | ✅ | Latest | ||||
GPT-NeoX-20B | ✅ | ✅ | Latest | ||||
Dolly-v2-3B | ✅ | ✅ | 4.28.1 or newer | ||||
MPT-7B, MPT-30B | ✅ | ✅ | Latest | ||||
Falcon-7B, Falcon-40B | ✅ | ✅ | Latest | ||||
BLOOM-7B | ✅ | ✅ | Latest | ||||
OPT-125m, OPT-1.3B, OPT-13B | ✅ | ✅ | Latest | ||||
Neural-Chat-7B-v3-1, Neural-Chat-7B-v3-2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Latest |
ChatGLM-6B, ChatGLM2-6B | ✅ | ✅ | 4.33.1 | ||||
Baichuan-13B-Chat, Baichuan2-13B-Chat | ✅ | ✅ | 4.33.1 | ||||
Mistral-7B | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 4.34.0 or newer |
Qwen-7B, Qwen-14B | ✅ | ✅ | Latest | ||||
phi-2, phi-1_5 phi-1 | ✅ | ✅ | Latest | ||||
Whisper-tiny, Whisper-base Whisper-small Whisper-medium Whisper-large | ✅ | ✅ | Latest |
Model Name | INT8 | INT4 | Transformer Version | ||
---|---|---|---|---|---|
RTN | GPTQ | RTN | GPTQ | ||
Code-LLaMA-7B, Code-LLaMA-13B | ✅ | ✅ | ✅ | ✅ | Latest |
Magicoder-6.7B | ✅ | ✅ | ✅ | ✅ | Latest |
StarCoder-1B, StarCoder-3B, StarCoder-15.5B | ✅ | ✅ | Latest |
Model Name | |||||
---|---|---|---|---|---|
F32 | F16 | Q4_0 | Q8_0 | BTLA | |
TheBloke/Llama-2-7B-Chat-GGUF | ✅ | ✅ | ✅ | ✅ | |
TheBloke/Mistral-7B-v0.1-GGUF, | ✅ | ✅ | ✅ | ✅ | |
TheBloke/SOLAR-10.7B-Instruct-v1.0-GGUF | ✅ | ✅ | ✅ | ✅ | |
TheBloke/CodeLlama-7B-GGUF | ✅ | ✅ | ✅ | ✅ | |
TheBloke/CodeLlama-13B-GGUF | ✅ | ✅ | ✅ | ✅ | |
Code-LLaMA-7B, Code-LLaMA-13B | ✅ | ✅ | ✅ | ✅ | ✅ |
meta-llama/Llama-2-7b-chat-hf | ✅ | ✅ | ✅ | ✅ | ✅ |
upstage/SOLAR-10.7B-Instruct-v1.0 | ✅ | ✅ | ✅ | ✅ | ✅ |
tiiuae/falcon-7 | ✅ | ✅ | ✅ | ✅ | ✅ |
tiiuae/falcon-40b | ✅ | ✅ | ✅ | ✅ | ✅ |
mpt-7b | ✅ | ✅ | ✅ | ✅ | ✅ |
mpt-30b | ✅ | ✅ | ✅ | ✅ | ✅ |
bloomz-7b1 | ✅ | ✅ | ✅ | ✅ | ✅ |