You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I followed D+S packing instruction and stored the packed .pt file in "~/models/${model_name}-squeezellm/packed_weight", where model_name="Llama-2-7b-chat-hf". When I load this model in vLLM:
vLLM complained cannot find parameters "sparse_threshold.model.layers.*". Any idea why? I repeated the quantization from scratch several times but all ended up in this error.
To get a quick fix, I manually skip the above error in vLLM model loading step in llama.py , if we cannot find the missing param. However, this time the model cannot generate meaningful output. So I believe the above parameters are indeed not loaded correctly.
The text was updated successfully, but these errors were encountered:
Hello!
I followed D+S packing instruction and stored the packed .pt file in "~/models/${model_name}-squeezellm/packed_weight", where model_name="Llama-2-7b-chat-hf". When I load this model in vLLM:
vLLM complained cannot find parameters "sparse_threshold.model.layers.*". Any idea why? I repeated the quantization from scratch several times but all ended up in this error.
To get a quick fix, I manually skip the above error in vLLM model loading step in llama.py , if we cannot find the missing param. However, this time the model cannot generate meaningful output. So I believe the above parameters are indeed not loaded correctly.
The text was updated successfully, but these errors were encountered: