What parameters required for using quantized llama2 models? #145

suhjohn · 2023-09-22T06:02:47Z

suhjohn
Sep 22, 2023

Hi, trying to use GPTQ or GGUF quantized models with lightllm but a bit confused. Should I provide --mode int8weight as an argument?

suhjohn · 2023-09-22T06:13:20Z

In particular, curious how I might be able to run https://huggingface.co/TheBloke/Spicyboros-70B-2.2-AWQ?not-for-all-audiences=true . Or, is the expectation that instead of providing the AWQ model, lightllm will quantize the model on the fly if --mode is provided? I think the docs can be clearer!

0 replies