Supported Models

Neural Speed supports the following models:

Model Name	INT8			INT4			Transformer Version
Model Name	RTN	GPTQ	AWQ	RTN	GPTQ	AWQ	Transformer Version
LLaMA2-7B, LLaMA2-13B, LLaMA2-70B	✅	✅	✅	✅	✅	✅	Latest
LLaMA-7B, LLaMA-13B	✅	✅	✅	✅	✅	✅	Latest
Solar-10.7B	✅	✅	✅	✅	✅	✅	Latest
GPT-J-6B	✅			✅			Latest
GPT-NeoX-20B	✅			✅			Latest
Dolly-v2-3B	✅			✅			4.28.1 or newer
MPT-7B, MPT-30B	✅			✅			Latest
Falcon-7B, Falcon-40B	✅			✅			Latest
BLOOM-7B	✅			✅			Latest
OPT-125m, OPT-1.3B, OPT-13B	✅			✅			Latest
Neural-Chat-7B-v3-1, Neural-Chat-7B-v3-2	✅	✅	✅	✅	✅	✅	Latest
ChatGLM-6B, ChatGLM2-6B	✅			✅			4.33.1
Baichuan-13B-Chat, Baichuan2-13B-Chat	✅			✅			4.33.1
Mistral-7B	✅	✅	✅	✅	✅	✅	4.34.0 or newer
Qwen-7B, Qwen-14B	✅			✅			Latest
phi-2, phi-1_5 phi-1	✅			✅			Latest
Whisper-tiny, Whisper-base Whisper-small Whisper-medium Whisper-large	✅			✅			Latest

Model Name	INT8		INT4		Transformer Version
Model Name	RTN	GPTQ	RTN	GPTQ	Transformer Version
Code-LLaMA-7B, Code-LLaMA-13B	✅	✅	✅	✅	Latest
Magicoder-6.7B	✅	✅	✅	✅	Latest
StarCoder-1B, StarCoder-3B, StarCoder-15.5B	✅		✅		Latest

Model Name
Model Name	F32	F16	Q4_0	Q8_0	BTLA
TheBloke/Llama-2-7B-Chat-GGUF	✅	✅	✅	✅
TheBloke/Mistral-7B-v0.1-GGUF,	✅	✅	✅	✅
TheBloke/SOLAR-10.7B-Instruct-v1.0-GGUF	✅	✅	✅	✅
TheBloke/CodeLlama-7B-GGUF	✅	✅	✅	✅
TheBloke/CodeLlama-13B-GGUF	✅	✅	✅	✅
Code-LLaMA-7B, Code-LLaMA-13B	✅	✅	✅	✅	✅
meta-llama/Llama-2-7b-chat-hf	✅	✅	✅	✅	✅
upstage/SOLAR-10.7B-Instruct-v1.0	✅	✅	✅	✅	✅
tiiuae/falcon-7	✅	✅	✅	✅	✅
tiiuae/falcon-40b	✅	✅	✅	✅	✅
mpt-7b	✅	✅	✅	✅	✅
mpt-30b	✅	✅	✅	✅	✅
bloomz-7b1	✅	✅	✅	✅	✅

Provide feedback