-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix: Download index file for cases with multiple safetensors #35
Conversation
The index file is necessary for loading the model with vLLM.
Extreme apologies on the delay - I'll definitely review the PR tomorrow - again sorry! |
@Erland366 Could you take a look at this PR thanks :) |
I still got an error of shape error. Which as you said, removing the |
The removal is enforced in `saving_utils.merge_and_overwrite_lora` because the function is intended to export 16-bit weights, meaning that no quantization should be applied.
Thanks for the feedback. I was unsure whether it was correct to remove that field, or the best way to do so in the code. I added a commit (09fbe3f) to patch the exported config file if necessary, which seems to be the most non-intrusive way (i.e., not modifying existing code) to perform this removal. I ran a smoke test with the latest changes, and vllm can now load the model without manual intervention. Please take a look. Thank you. |
Can you test when saving LoRA, did you also save the Then after merge from upstream/main. It good to go I am sorry, usually I would do it myself but my CPU is so weak, merging always crash my PC and it'll took me at least 40 minutes for merging 2B models. So it's really hard to test :/ |
@Erland366 No problem. The patch only affects the It's reasonable to be cautious, though. I will provide the smoke test results and merge the main branch soon. |
Please see the screenshots below for the testing results. I have also merged the upstream changes into this branch. |
I think now all good @danielhanchen |
Please excuse me for presumptuously calling this patch a "bugfix." I am unsure if there is another method to correctly export fine-tuned LoRA weights for Qwen-2 VL models into full FP16 model weights that would be loadable by vLLM. However, this patch resolves the issue for me.
In summary, the
save_pretrained_merged
method of Qwen-2 model objects uses themerge_and_overwrite_lora
function from this repository. This function downloads the safetensors files from the original FP16 model repository, applies updates to the weights using the provided LoRA weights, then writes the modified weights back into the downloaded safetensors files, if I have understood it correctly. However, the model index filemodel.safetensors.index
is not being downloaded, and consequently, vLLM refuses to load the model due to the absence of this index file (I suspect the huggingface transformers library behaves similarly). The proposed solution is to download the index file alongside the safetensors files.As I am not deeply familiar with this codebase, I apologize in advance if my understanding is incomplete. I hope this pull request aids someone encountering the same issue (vLLM refusing to load the merged model weights). Any feedback would be greatly appreciated.
Demo
Note: I manually removed the
quantization_config
section from the configuration file to enable vLLM to load the model properly.