-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vLLM error with omniquant Llama-3-8b #256
Comments
Can you provide the configuration file? |
Thank you for your reply, here is the configuration file, I also tried the qwen2 but also face the same error base: |
Okay, I'll try to reproduce the problem you encountered. |
thank you very much ! |
You can try the latest code, which has already fixed the mentioned issue. |
Thank you for your help, however, it seems the added code influenced the block_forward function and caused the following error. May I know your configuration? [rank0]: Traceback (most recent call last): |
The latest code should not affect the block forward function. You can clone it again, update the running environment (especially the transformers version), and try to run it again. |
Hi developer,
File ~/delta_p/lib/python3.10/site-packages/vllm/model_executor/models/utils.py:175, in AutoWeightsLoader._load_module(self, base_prefix, module, weights)
173 module_load_weights = getattr(module, "load_weights", None)
174 if callable(module_load_weights):
--> 175 module_load_weights(weights)
176 return
178 child_modules = dict(module.named_children())
File ~/delta_p/lib/python3.10/site-packages/vllm/model_executor/models/llama.py:407, in LlamaModel.load_weights(self, weights)
404 if is_pp_missing_parameter(name, self):
405 continue
--> 407 param = params_dict[name]
408 weight_loader = getattr(param, "weight_loader",
409 default_weight_loader)
410 weight_loader(param, loaded_weight)
KeyError: 'layers.0.fc1_smooth_scale'
Thank you for your help!
The text was updated successfully, but these errors were encountered: