Bugfix: Download index file for cases with multiple safetensors #35

ceshine · 2025-01-18T05:57:14Z

Please excuse me for presumptuously calling this patch a "bugfix." I am unsure if there is another method to correctly export fine-tuned LoRA weights for Qwen-2 VL models into full FP16 model weights that would be loadable by vLLM. However, this patch resolves the issue for me.

In summary, the save_pretrained_merged method of Qwen-2 model objects uses the merge_and_overwrite_lora function from this repository. This function downloads the safetensors files from the original FP16 model repository, applies updates to the weights using the provided LoRA weights, then writes the modified weights back into the downloaded safetensors files, if I have understood it correctly. However, the model index file model.safetensors.index is not being downloaded, and consequently, vLLM refuses to load the model due to the absence of this index file (I suspect the huggingface transformers library behaves similarly). The proposed solution is to download the index file alongside the safetensors files.

As I am not deeply familiar with this codebase, I apologize in advance if my understanding is incomplete. I hope this pull request aids someone encountering the same issue (vLLM refusing to load the merged model weights). Any feedback would be greatly appreciated.

Demo

Note: I manually removed the quantization_config section from the configuration file to enable vLLM to load the model properly.

The index file is necessary for loading the model with vLLM.

danielhanchen · 2025-01-31T11:11:14Z

Extreme apologies on the delay - I'll definitely review the PR tomorrow - again sorry!

danielhanchen · 2025-02-15T09:16:23Z

@Erland366 Could you take a look at this PR thanks :)

Erland366 · 2025-02-15T22:15:41Z

I still got an error of shape error. Which as you said, removing the quantization_config helps. We might need to remove that too before saving it

The removal is enforced in `saving_utils.merge_and_overwrite_lora` because the function is intended to export 16-bit weights, meaning that no quantization should be applied.

ceshine · 2025-02-16T04:30:42Z

@Erland366

Thanks for the feedback. I was unsure whether it was correct to remove that field, or the best way to do so in the code.

I added a commit (09fbe3f) to patch the exported config file if necessary, which seems to be the most non-intrusive way (i.e., not modifying existing code) to perform this removal. I ran a smoke test with the latest changes, and vllm can now load the model without manual intervention. Please take a look. Thank you.

Erland366 · 2025-02-16T04:53:00Z

Can you test when saving LoRA, did you also save the config.json? (I worried that now we will save both config.json and adapter_config.json which will raise an error later on)

Then after merge from upstream/main. It good to go

I am sorry, usually I would do it myself but my CPU is so weak, merging always crash my PC and it'll took me at least 40 minutes for merging 2B models. So it's really hard to test :/

ceshine · 2025-02-16T05:09:16Z

@Erland366 No problem. The patch only affects the merge_and_overwrite_lora function, so it's unlikely to impact regular LoRA saving. Although the GitHub diff interface may suggest that the upload_items function is involved, it is actually unrelated.

It's reasonable to be cautious, though. I will provide the smoke test results and merge the main branch soon.

ceshine · 2025-02-16T05:21:11Z

@Erland366

Please see the screenshots below for the testing results. I have also merged the upstream changes into this branch.

Erland366 · 2025-02-16T06:03:04Z

I think now all good @danielhanchen

Download index file for cases with multiple safetensors

bdb0d55

The index file is necessary for loading the model with vLLM.

Remove the quantization_config in the config.json file if it exists

09fbe3f

The removal is enforced in `saving_utils.merge_and_overwrite_lora` because the function is intended to export 16-bit weights, meaning that no quantization should be applied.

Merge remote-tracking branch 'origin/main' into patch-for-qwen2-vl

9f91373

danielhanchen changed the base branch from main to nightly February 16, 2025 10:08

danielhanchen changed the base branch from nightly to main February 20, 2025 07:10

danielhanchen merged commit e839b3d into unslothai:main Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix: Download index file for cases with multiple safetensors #35

Bugfix: Download index file for cases with multiple safetensors #35

ceshine commented Jan 18, 2025

danielhanchen commented Jan 31, 2025

danielhanchen commented Feb 15, 2025

Erland366 commented Feb 15, 2025

ceshine commented Feb 16, 2025

Erland366 commented Feb 16, 2025

ceshine commented Feb 16, 2025

ceshine commented Feb 16, 2025

Erland366 commented Feb 16, 2025

Bugfix: Download index file for cases with multiple safetensors #35

Bugfix: Download index file for cases with multiple safetensors #35

Conversation

ceshine commented Jan 18, 2025

Demo

danielhanchen commented Jan 31, 2025

danielhanchen commented Feb 15, 2025

Erland366 commented Feb 15, 2025

ceshine commented Feb 16, 2025

Erland366 commented Feb 16, 2025

ceshine commented Feb 16, 2025

ceshine commented Feb 16, 2025

Erland366 commented Feb 16, 2025