Skip to content

Latest commit

 

History

History
393 lines (386 loc) · 11.5 KB

supported_models.md

File metadata and controls

393 lines (386 loc) · 11.5 KB

Supported Models

Neural Speed supports the following models:

Text Generation

Model Name INT8 INT4 Transformer Version
RTN GPTQ AWQ RTN GPTQ AWQ
LLaMA2-7B, LLaMA2-13B, LLaMA2-70B Latest
LLaMA-7B, LLaMA-13B Latest
Solar-10.7B Latest
GPT-J-6B Latest
GPT-NeoX-20B Latest
Dolly-v2-3B 4.28.1 or newer
MPT-7B, MPT-30B Latest
Falcon-7B, Falcon-40B Latest
BLOOM-7B Latest
OPT-125m, OPT-1.3B, OPT-13B Latest
Neural-Chat-7B-v3-1, Neural-Chat-7B-v3-2 Latest
ChatGLM-6B, ChatGLM2-6B 4.33.1
Baichuan-13B-Chat, Baichuan2-13B-Chat 4.33.1
Mistral-7B 4.34.0 or newer
Qwen-7B, Qwen-14B Latest
phi-2, phi-1_5 phi-1 Latest
Whisper-tiny, Whisper-base Whisper-small Whisper-medium Whisper-large Latest

Code Generation

Model Name INT8 INT4 Transformer Version
RTN GPTQ RTN GPTQ
Code-LLaMA-7B, Code-LLaMA-13B Latest
Magicoder-6.7B Latest
StarCoder-1B, StarCoder-3B, StarCoder-15.5B Latest

Validated GGUF Models

Model Name
F32 F16 Q4_0 Q8_0 BTLA
TheBloke/Llama-2-7B-Chat-GGUF
TheBloke/Mistral-7B-v0.1-GGUF,
TheBloke/SOLAR-10.7B-Instruct-v1.0-GGUF
TheBloke/CodeLlama-7B-GGUF
TheBloke/CodeLlama-13B-GGUF
Code-LLaMA-7B, Code-LLaMA-13B
meta-llama/Llama-2-7b-chat-hf
upstage/SOLAR-10.7B-Instruct-v1.0
tiiuae/falcon-7
tiiuae/falcon-40b
mpt-7b
mpt-30b
bloomz-7b1