Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: Download index file for cases with multiple safetensors #35

Merged
merged 3 commits into from
Feb 20, 2025

Conversation

ceshine
Copy link
Contributor

@ceshine ceshine commented Jan 18, 2025

Please excuse me for presumptuously calling this patch a "bugfix." I am unsure if there is another method to correctly export fine-tuned LoRA weights for Qwen-2 VL models into full FP16 model weights that would be loadable by vLLM. However, this patch resolves the issue for me.

In summary, the save_pretrained_merged method of Qwen-2 model objects uses the merge_and_overwrite_lora function from this repository. This function downloads the safetensors files from the original FP16 model repository, applies updates to the weights using the provided LoRA weights, then writes the modified weights back into the downloaded safetensors files, if I have understood it correctly. However, the model index file model.safetensors.index is not being downloaded, and consequently, vLLM refuses to load the model due to the absence of this index file (I suspect the huggingface transformers library behaves similarly). The proposed solution is to download the index file alongside the safetensors files.

As I am not deeply familiar with this codebase, I apologize in advance if my understanding is incomplete. I hope this pull request aids someone encountering the same issue (vLLM refusing to load the merged model weights). Any feedback would be greatly appreciated.

Demo

image

image

image

Note: I manually removed the quantization_config section from the configuration file to enable vLLM to load the model properly.

image

image

image

The index file is necessary for loading the model with vLLM.
@danielhanchen
Copy link
Contributor

Extreme apologies on the delay - I'll definitely review the PR tomorrow - again sorry!

@danielhanchen
Copy link
Contributor

@Erland366 Could you take a look at this PR thanks :)

@Erland366
Copy link
Contributor

I still got an error of shape error. Which as you said, removing the quantization_config helps. We might need to remove that too before saving it

The removal is enforced in `saving_utils.merge_and_overwrite_lora`
because the function is intended to export 16-bit weights,
meaning that no quantization should be applied.
@ceshine
Copy link
Contributor Author

ceshine commented Feb 16, 2025

@Erland366

Thanks for the feedback. I was unsure whether it was correct to remove that field, or the best way to do so in the code.

I added a commit (09fbe3f) to patch the exported config file if necessary, which seems to be the most non-intrusive way (i.e., not modifying existing code) to perform this removal. I ran a smoke test with the latest changes, and vllm can now load the model without manual intervention. Please take a look. Thank you.

@Erland366
Copy link
Contributor

Can you test when saving LoRA, did you also save the config.json? (I worried that now we will save both config.json and adapter_config.json which will raise an error later on)

Then after merge from upstream/main. It good to go

I am sorry, usually I would do it myself but my CPU is so weak, merging always crash my PC and it'll took me at least 40 minutes for merging 2B models. So it's really hard to test :/

@ceshine
Copy link
Contributor Author

ceshine commented Feb 16, 2025

@Erland366 No problem. The patch only affects the merge_and_overwrite_lora function, so it's unlikely to impact regular LoRA saving. Although the GitHub diff interface may suggest that the upload_items function is involved, it is actually unrelated.

It's reasonable to be cautious, though. I will provide the smoke test results and merge the main branch soon.

image

image

@ceshine
Copy link
Contributor Author

ceshine commented Feb 16, 2025

@Erland366

Please see the screenshots below for the testing results. I have also merged the upstream changes into this branch.

image

image

image

@Erland366
Copy link
Contributor

I think now all good @danielhanchen

@danielhanchen danielhanchen changed the base branch from main to nightly February 16, 2025 10:08
@danielhanchen danielhanchen changed the base branch from nightly to main February 20, 2025 07:10
@danielhanchen danielhanchen merged commit e839b3d into unslothai:main Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants