Bug: Converted HF LoRA adapter on Llama 3.1 not loading. #9114

Ujjawal-K-Panchal · 2024-08-21T08:04:36Z

What happened?

In short: Using the standard procedure from documents, I am unable to attach a converted LoRA adapter (hf -> GGUF) to a Llama3.1 GGUF model.

Procedure:

Finetune llama3.1 hf repo using peft LoRA adapter, then save adapter in a specific directory, say lora-dir/ for later access. (Using trl.SFTTrainer; saved using output_dir parameter).
Convert Llama 3.1 model from hf repo to .gguf format via the prescribed method (convert_hf_to_gguf.py).
Quantize the Llama 3.1 GGUF to Q4_K_M using instructions in examples/quantize/Readme.md.
Convert saved LoRA adapter to bf16 GGUF format using the following command:
python convert_lora_to_gguf.py ../lora-dir/ --outfile ../lora-dir/llama31-lora.gguf --outtype bf16 --base ../models/models--meta-llama--Meta-Llama-3.1-8B-Instruct/snapshots/8c22764a7e3675c50d4c7c9a4edb474456022b16/
Try running CLI using the above:
./llama-cli -m ../modelstore/llama31-Q4_K_M-v2.gguf --lora ./lora-dir/llama31-freedom-lora-v010.gguf.

Step 5 shows: llama_lora_adapter_init: failed to apply lora adapter: LoRA tensor 'rope_freqs.weight' has unexpected suffix'. More in log output.

Additional notes:

Note: I tried running the llama-cli on the base model which worked flawlessly.
Theory: Recently I saw a bug here regarding eccentric rope scaling changes in Llama 3.1 relative to Llama 3 and a PR that fixed that bug. Those changes might not have been applied to convert_lora_to_gguf.py.
Note: snapshot: 8c22764a7e3675c50d4c7c9a4edb474456022b16 is the current default in llama 3.1.

Name and Version

version: 3484 (4730fac)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

........................................................................................
llama_new_context_with_model: n_ctx      = 131072
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:  CUDA_Host KV buffer size = 16384.00 MiB
llama_new_context_with_model: KV self size  = 16384.00 MiB, K (f16): 8192.00 MiB, V (f16): 8192.00 MiB
llama_new_context_with_model:  CUDA_Host  output buffer size =     0.49 MiB
llama_new_context_with_model:      CUDA0 compute buffer size =  8984.00 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =   264.01 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 420
llama_lora_adapter_init_internal: loading lora adapter from '../lora-dir/llama31-lora.gguf' ...
llama_lora_adapter_init: failed to apply lora adapter: LoRA tensor 'rope_freqs.weight' has unexpected suffix'
llama_init_from_gpt_params: error: failed to apply lora adapter '../lora-dir/llama31-lora.gguf'
main: error: unable to load model

The text was updated successfully, but these errors were encountered:

ngxson · 2024-08-21T12:56:09Z

Can you try the conversion script from #9117 ?

Ujjawal-K-Panchal · 2024-08-21T14:14:09Z

Thank you so much for the quick response! Testing this.

Ujjawal-K-Panchal · 2024-08-22T06:16:10Z

Now the above mentioned works perfectly. I also tried different quantizations. I see no problems. Putting the log snippet of the received output below:

llama_lora_adapter_init_internal:        CPU LoRA buffer size =    72.00 MiB
llama_lora_adapter_init_internal: loaded 192 tensors from lora file

system_info: n_threads = 8 / 16 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 1 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | 
sampling: 
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order: 
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature 
generate: n_ctx = 131072, n_batch = 2048, n_predict = -1, n_keep = 1


 [end of text]

llama_print_timings:        load time =   15135.41 ms
llama_print_timings:      sample time =       0.59 ms /     4 runs   (    0.15 ms per token,  6779.66 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =     428.97 ms /     4 runs   (  107.24 ms per token,     9.32 tokens per second)
llama_print_timings:       total time =     430.61 ms /     4 tokens
Log end

Thanks for the help!

Ujjawal-K-Panchal · 2024-08-22T06:20:16Z

Keeping this open till the PR is merged.

Ujjawal-K-Panchal added bug-unconfirmed critical severity Used to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss) labels Aug 21, 2024

ngxson mentioned this issue Aug 21, 2024

lora : fix llama conversion script with model having ROPE_FREQS #9117

Merged

4 tasks

Ujjawal-K-Panchal closed this as completed Aug 22, 2024

Ujjawal-K-Panchal reopened this Aug 22, 2024

ngxson closed this as completed in #9117 Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Converted HF LoRA adapter on Llama 3.1 not loading. #9114

Bug: Converted HF LoRA adapter on Llama 3.1 not loading. #9114

Ujjawal-K-Panchal commented Aug 21, 2024 •

edited

Loading

ngxson commented Aug 21, 2024

Ujjawal-K-Panchal commented Aug 21, 2024

Ujjawal-K-Panchal commented Aug 22, 2024

Ujjawal-K-Panchal commented Aug 22, 2024

Bug: Converted HF LoRA adapter on Llama 3.1 not loading. #9114

Bug: Converted HF LoRA adapter on Llama 3.1 not loading. #9114

Comments

Ujjawal-K-Panchal commented Aug 21, 2024 • edited Loading

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

ngxson commented Aug 21, 2024

Ujjawal-K-Panchal commented Aug 21, 2024

Ujjawal-K-Panchal commented Aug 22, 2024

Ujjawal-K-Panchal commented Aug 22, 2024

Ujjawal-K-Panchal commented Aug 21, 2024 •

edited

Loading