Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vocab mismatch when I convert original Llama 2 Model on Macbook Pro M1 Pro #4045

Closed
4 tasks done
PeterWrighten opened this issue Nov 12, 2023 · 9 comments
Closed
4 tasks done

Comments

@PeterWrighten
Copy link

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Current Behavior

skipping tensor rope_freqs
Writing ../llama/llama-2-7b-chat/ggml-model-f16.gguf, format 1
Traceback (most recent call last):
  File "/Users/apple/OSS/llama.cpp/convert.py", line 1206, in <module>
    main()
  File "/Users/apple/OSS/llama.cpp/convert.py", line 1201, in main
    OutputFile.write_all(outfile, ftype, params, model, vocab, special_vocab, concurrency = args.concurrency, endianess=endianess)
  File "/Users/apple/OSS/llama.cpp/convert.py", line 907, in write_all
    check_vocab_size(params, vocab)
  File "/Users/apple/OSS/llama.cpp/convert.py", line 794, in check_vocab_size
    raise Exception(msg)
Exception: Vocab size mismatch (model has -1, but ../llama/tokenizer.model has 32000).

Environment and Context

I tried converting original Llama2 Model into ggml format with python3 convert.py /llama/llama-2-7b-chat

  • Physical (or virtual) hardware you are using

Apple M1 Pro

  • Operating System

Darwin PeterWrightMacBook-Pro14.local 23.1.0 Darwin Kernel Version 23.1.0: Mon Oct 9 21:27:24 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T6000 arm64

  • SDK version
Python 3.9.10

GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

Apple clang version 15.0.0 (clang-1500.0.40.1)
Target: arm64-apple-darwin23.1.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

This program built for i386-apple-darwin11.3.0

Failure Information (for bugs)

skipping tensor rope_freqs
Writing ../llama/llama-2-7b-chat/ggml-model-f16.gguf, format 1
Traceback (most recent call last):
  File "/Users/apple/OSS/llama.cpp/convert.py", line 1206, in <module>
    main()
  File "/Users/apple/OSS/llama.cpp/convert.py", line 1201, in main
    OutputFile.write_all(outfile, ftype, params, model, vocab, special_vocab, concurrency = args.concurrency, endianess=endianess)
  File "/Users/apple/OSS/llama.cpp/convert.py", line 907, in write_all
    check_vocab_size(params, vocab)
  File "/Users/apple/OSS/llama.cpp/convert.py", line 794, in check_vocab_size
    raise Exception(msg)
Exception: Vocab size mismatch (model has -1, but ../llama/tokenizer.model has 32000).

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

  1. step 1
    Download original Llama2 model from MetaAI

  2. step2
    git clone this repo,

LLAMA_METAL=1 make
  1. step3
python convert.py ../llama/llama-2-7b-chat
  1. step4

I have tried add 'added_tokens.json', but it still doesn't work.

Failure Logs

python convert.py ../llama/llama-2-7b-chat
Loading model file ../llama/llama-2-7b-chat/consolidated.00.pth
params = Params(n_vocab=-1, n_embd=4096, n_layer=32, n_ctx=2048, n_ff=11008, n_head=32, n_head_kv=32, f_norm_eps=1e-06, rope_scaling_type=None, f_rope_freq_base=None, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=None, path_model=PosixPath('../llama/llama-2-7b-chat'))
Loading vocab file '../llama/tokenizer.model', type 'spm'
tok_embeddings.weight                            -> token_embd.weight                        | BF16   | [32000, 4096]
norm.weight                                      -> output_norm.weight                       | BF16   | [4096]
output.weight                                    -> output.weight                            | BF16   | [32000, 4096]
layers.0.attention.wq.weight                     -> blk.0.attn_q.weight                      | BF16   | [4096, 4096]
layers.0.attention.wk.weight                     -> blk.0.attn_k.weight                      | BF16   | [4096, 4096]
layers.0.attention.wv.weight                     -> blk.0.attn_v.weight                      | BF16   | [4096, 4096]
layers.0.attention.wo.weight                     -> blk.0.attn_output.weight                 | BF16   | [4096, 4096]
layers.0.feed_forward.w1.weight                  -> blk.0.ffn_gate.weight                    | BF16   | [11008, 4096]
layers.0.feed_forward.w2.weight                  -> blk.0.ffn_down.weight                    | BF16   | [4096, 11008]
layers.0.feed_forward.w3.weight                  -> blk.0.ffn_up.weight                      | BF16   | [11008, 4096]
layers.0.attention_norm.weight                   -> blk.0.attn_norm.weight                   | BF16   | [4096]
layers.0.ffn_norm.weight                         -> blk.0.ffn_norm.weight                    | BF16   | [4096]
layers.1.attention.wq.weight                     -> blk.1.attn_q.weight                      | BF16   | [4096, 4096]
layers.1.attention.wk.weight                     -> blk.1.attn_k.weight                      | BF16   | [4096, 4096]
layers.1.attention.wv.weight                     -> blk.1.attn_v.weight                      | BF16   | [4096, 4096]
layers.1.attention.wo.weight                     -> blk.1.attn_output.weight                 | BF16   | [4096, 4096]
layers.1.feed_forward.w1.weight                  -> blk.1.ffn_gate.weight                    | BF16   | [11008, 4096]
layers.1.feed_forward.w2.weight                  -> blk.1.ffn_down.weight                    | BF16   | [4096, 11008]
layers.1.feed_forward.w3.weight                  -> blk.1.ffn_up.weight                      | BF16   | [11008, 4096]
layers.1.attention_norm.weight                   -> blk.1.attn_norm.weight                   | BF16   | [4096]
layers.1.ffn_norm.weight                         -> blk.1.ffn_norm.weight                    | BF16   | [4096]
layers.2.attention.wq.weight                     -> blk.2.attn_q.weight                      | BF16   | [4096, 4096]
layers.2.attention.wk.weight                     -> blk.2.attn_k.weight                      | BF16   | [4096, 4096]
layers.2.attention.wv.weight                     -> blk.2.attn_v.weight                      | BF16   | [4096, 4096]
layers.2.attention.wo.weight                     -> blk.2.attn_output.weight                 | BF16   | [4096, 4096]
layers.2.feed_forward.w1.weight                  -> blk.2.ffn_gate.weight                    | BF16   | [11008, 4096]
layers.2.feed_forward.w2.weight                  -> blk.2.ffn_down.weight                    | BF16   | [4096, 11008]
layers.2.feed_forward.w3.weight                  -> blk.2.ffn_up.weight                      | BF16   | [11008, 4096]
layers.2.attention_norm.weight                   -> blk.2.attn_norm.weight                   | BF16   | [4096]
layers.2.ffn_norm.weight                         -> blk.2.ffn_norm.weight                    | BF16   | [4096]
layers.3.attention.wq.weight                     -> blk.3.attn_q.weight                      | BF16   | [4096, 4096]
layers.3.attention.wk.weight                     -> blk.3.attn_k.weight                      | BF16   | [4096, 4096]
layers.3.attention.wv.weight                     -> blk.3.attn_v.weight                      | BF16   | [4096, 4096]
layers.3.attention.wo.weight                     -> blk.3.attn_output.weight                 | BF16   | [4096, 4096]
layers.3.feed_forward.w1.weight                  -> blk.3.ffn_gate.weight                    | BF16   | [11008, 4096]
layers.3.feed_forward.w2.weight                  -> blk.3.ffn_down.weight                    | BF16   | [4096, 11008]
layers.3.feed_forward.w3.weight                  -> blk.3.ffn_up.weight                      | BF16   | [11008, 4096]
layers.3.attention_norm.weight                   -> blk.3.attn_norm.weight                   | BF16   | [4096]
layers.3.ffn_norm.weight                         -> blk.3.ffn_norm.weight                    | BF16   | [4096]
layers.4.attention.wq.weight                     -> blk.4.attn_q.weight                      | BF16   | [4096, 4096]
layers.4.attention.wk.weight                     -> blk.4.attn_k.weight                      | BF16   | [4096, 4096]
layers.4.attention.wv.weight                     -> blk.4.attn_v.weight                      | BF16   | [4096, 4096]
layers.4.attention.wo.weight                     -> blk.4.attn_output.weight                 | BF16   | [4096, 4096]
layers.4.feed_forward.w1.weight                  -> blk.4.ffn_gate.weight                    | BF16   | [11008, 4096]
layers.4.feed_forward.w2.weight                  -> blk.4.ffn_down.weight                    | BF16   | [4096, 11008]
layers.4.feed_forward.w3.weight                  -> blk.4.ffn_up.weight                      | BF16   | [11008, 4096]
layers.4.attention_norm.weight                   -> blk.4.attn_norm.weight                   | BF16   | [4096]
layers.4.ffn_norm.weight                         -> blk.4.ffn_norm.weight                    | BF16   | [4096]
layers.5.attention.wq.weight                     -> blk.5.attn_q.weight                      | BF16   | [4096, 4096]
layers.5.attention.wk.weight                     -> blk.5.attn_k.weight                      | BF16   | [4096, 4096]
layers.5.attention.wv.weight                     -> blk.5.attn_v.weight                      | BF16   | [4096, 4096]
layers.5.attention.wo.weight                     -> blk.5.attn_output.weight                 | BF16   | [4096, 4096]
layers.5.feed_forward.w1.weight                  -> blk.5.ffn_gate.weight                    | BF16   | [11008, 4096]
layers.5.feed_forward.w2.weight                  -> blk.5.ffn_down.weight                    | BF16   | [4096, 11008]
layers.5.feed_forward.w3.weight                  -> blk.5.ffn_up.weight                      | BF16   | [11008, 4096]
layers.5.attention_norm.weight                   -> blk.5.attn_norm.weight                   | BF16   | [4096]
layers.5.ffn_norm.weight                         -> blk.5.ffn_norm.weight                    | BF16   | [4096]
layers.6.attention.wq.weight                     -> blk.6.attn_q.weight                      | BF16   | [4096, 4096]
layers.6.attention.wk.weight                     -> blk.6.attn_k.weight                      | BF16   | [4096, 4096]
layers.6.attention.wv.weight                     -> blk.6.attn_v.weight                      | BF16   | [4096, 4096]
layers.6.attention.wo.weight                     -> blk.6.attn_output.weight                 | BF16   | [4096, 4096]
layers.6.feed_forward.w1.weight                  -> blk.6.ffn_gate.weight                    | BF16   | [11008, 4096]
layers.6.feed_forward.w2.weight                  -> blk.6.ffn_down.weight                    | BF16   | [4096, 11008]
layers.6.feed_forward.w3.weight                  -> blk.6.ffn_up.weight                      | BF16   | [11008, 4096]
layers.6.attention_norm.weight                   -> blk.6.attn_norm.weight                   | BF16   | [4096]
layers.6.ffn_norm.weight                         -> blk.6.ffn_norm.weight                    | BF16   | [4096]
layers.7.attention.wq.weight                     -> blk.7.attn_q.weight                      | BF16   | [4096, 4096]
layers.7.attention.wk.weight                     -> blk.7.attn_k.weight                      | BF16   | [4096, 4096]
layers.7.attention.wv.weight                     -> blk.7.attn_v.weight                      | BF16   | [4096, 4096]
layers.7.attention.wo.weight                     -> blk.7.attn_output.weight                 | BF16   | [4096, 4096]
layers.7.feed_forward.w1.weight                  -> blk.7.ffn_gate.weight                    | BF16   | [11008, 4096]
layers.7.feed_forward.w2.weight                  -> blk.7.ffn_down.weight                    | BF16   | [4096, 11008]
layers.7.feed_forward.w3.weight                  -> blk.7.ffn_up.weight                      | BF16   | [11008, 4096]
layers.7.attention_norm.weight                   -> blk.7.attn_norm.weight                   | BF16   | [4096]
layers.7.ffn_norm.weight                         -> blk.7.ffn_norm.weight                    | BF16   | [4096]
layers.8.attention.wq.weight                     -> blk.8.attn_q.weight                      | BF16   | [4096, 4096]
layers.8.attention.wk.weight                     -> blk.8.attn_k.weight                      | BF16   | [4096, 4096]
layers.8.attention.wv.weight                     -> blk.8.attn_v.weight                      | BF16   | [4096, 4096]
layers.8.attention.wo.weight                     -> blk.8.attn_output.weight                 | BF16   | [4096, 4096]
layers.8.feed_forward.w1.weight                  -> blk.8.ffn_gate.weight                    | BF16   | [11008, 4096]
layers.8.feed_forward.w2.weight                  -> blk.8.ffn_down.weight                    | BF16   | [4096, 11008]
layers.8.feed_forward.w3.weight                  -> blk.8.ffn_up.weight                      | BF16   | [11008, 4096]
layers.8.attention_norm.weight                   -> blk.8.attn_norm.weight                   | BF16   | [4096]
layers.8.ffn_norm.weight                         -> blk.8.ffn_norm.weight                    | BF16   | [4096]
layers.9.attention.wq.weight                     -> blk.9.attn_q.weight                      | BF16   | [4096, 4096]
layers.9.attention.wk.weight                     -> blk.9.attn_k.weight                      | BF16   | [4096, 4096]
layers.9.attention.wv.weight                     -> blk.9.attn_v.weight                      | BF16   | [4096, 4096]
layers.9.attention.wo.weight                     -> blk.9.attn_output.weight                 | BF16   | [4096, 4096]
layers.9.feed_forward.w1.weight                  -> blk.9.ffn_gate.weight                    | BF16   | [11008, 4096]
layers.9.feed_forward.w2.weight                  -> blk.9.ffn_down.weight                    | BF16   | [4096, 11008]
layers.9.feed_forward.w3.weight                  -> blk.9.ffn_up.weight                      | BF16   | [11008, 4096]
layers.9.attention_norm.weight                   -> blk.9.attn_norm.weight                   | BF16   | [4096]
layers.9.ffn_norm.weight                         -> blk.9.ffn_norm.weight                    | BF16   | [4096]
layers.10.attention.wq.weight                    -> blk.10.attn_q.weight                     | BF16   | [4096, 4096]
layers.10.attention.wk.weight                    -> blk.10.attn_k.weight                     | BF16   | [4096, 4096]
layers.10.attention.wv.weight                    -> blk.10.attn_v.weight                     | BF16   | [4096, 4096]
layers.10.attention.wo.weight                    -> blk.10.attn_output.weight                | BF16   | [4096, 4096]
layers.10.feed_forward.w1.weight                 -> blk.10.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.10.feed_forward.w2.weight                 -> blk.10.ffn_down.weight                   | BF16   | [4096, 11008]
layers.10.feed_forward.w3.weight                 -> blk.10.ffn_up.weight                     | BF16   | [11008, 4096]
layers.10.attention_norm.weight                  -> blk.10.attn_norm.weight                  | BF16   | [4096]
layers.10.ffn_norm.weight                        -> blk.10.ffn_norm.weight                   | BF16   | [4096]
layers.11.attention.wq.weight                    -> blk.11.attn_q.weight                     | BF16   | [4096, 4096]
layers.11.attention.wk.weight                    -> blk.11.attn_k.weight                     | BF16   | [4096, 4096]
layers.11.attention.wv.weight                    -> blk.11.attn_v.weight                     | BF16   | [4096, 4096]
layers.11.attention.wo.weight                    -> blk.11.attn_output.weight                | BF16   | [4096, 4096]
layers.11.feed_forward.w1.weight                 -> blk.11.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.11.feed_forward.w2.weight                 -> blk.11.ffn_down.weight                   | BF16   | [4096, 11008]
layers.11.feed_forward.w3.weight                 -> blk.11.ffn_up.weight                     | BF16   | [11008, 4096]
layers.11.attention_norm.weight                  -> blk.11.attn_norm.weight                  | BF16   | [4096]
layers.11.ffn_norm.weight                        -> blk.11.ffn_norm.weight                   | BF16   | [4096]
layers.12.attention.wq.weight                    -> blk.12.attn_q.weight                     | BF16   | [4096, 4096]
layers.12.attention.wk.weight                    -> blk.12.attn_k.weight                     | BF16   | [4096, 4096]
layers.12.attention.wv.weight                    -> blk.12.attn_v.weight                     | BF16   | [4096, 4096]
layers.12.attention.wo.weight                    -> blk.12.attn_output.weight                | BF16   | [4096, 4096]
layers.12.feed_forward.w1.weight                 -> blk.12.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.12.feed_forward.w2.weight                 -> blk.12.ffn_down.weight                   | BF16   | [4096, 11008]
layers.12.feed_forward.w3.weight                 -> blk.12.ffn_up.weight                     | BF16   | [11008, 4096]
layers.12.attention_norm.weight                  -> blk.12.attn_norm.weight                  | BF16   | [4096]
layers.12.ffn_norm.weight                        -> blk.12.ffn_norm.weight                   | BF16   | [4096]
layers.13.attention.wq.weight                    -> blk.13.attn_q.weight                     | BF16   | [4096, 4096]
layers.13.attention.wk.weight                    -> blk.13.attn_k.weight                     | BF16   | [4096, 4096]
layers.13.attention.wv.weight                    -> blk.13.attn_v.weight                     | BF16   | [4096, 4096]
layers.13.attention.wo.weight                    -> blk.13.attn_output.weight                | BF16   | [4096, 4096]
layers.13.feed_forward.w1.weight                 -> blk.13.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.13.feed_forward.w2.weight                 -> blk.13.ffn_down.weight                   | BF16   | [4096, 11008]
layers.13.feed_forward.w3.weight                 -> blk.13.ffn_up.weight                     | BF16   | [11008, 4096]
layers.13.attention_norm.weight                  -> blk.13.attn_norm.weight                  | BF16   | [4096]
layers.13.ffn_norm.weight                        -> blk.13.ffn_norm.weight                   | BF16   | [4096]
layers.14.attention.wq.weight                    -> blk.14.attn_q.weight                     | BF16   | [4096, 4096]
layers.14.attention.wk.weight                    -> blk.14.attn_k.weight                     | BF16   | [4096, 4096]
layers.14.attention.wv.weight                    -> blk.14.attn_v.weight                     | BF16   | [4096, 4096]
layers.14.attention.wo.weight                    -> blk.14.attn_output.weight                | BF16   | [4096, 4096]
layers.14.feed_forward.w1.weight                 -> blk.14.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.14.feed_forward.w2.weight                 -> blk.14.ffn_down.weight                   | BF16   | [4096, 11008]
layers.14.feed_forward.w3.weight                 -> blk.14.ffn_up.weight                     | BF16   | [11008, 4096]
layers.14.attention_norm.weight                  -> blk.14.attn_norm.weight                  | BF16   | [4096]
layers.14.ffn_norm.weight                        -> blk.14.ffn_norm.weight                   | BF16   | [4096]
layers.15.attention.wq.weight                    -> blk.15.attn_q.weight                     | BF16   | [4096, 4096]
layers.15.attention.wk.weight                    -> blk.15.attn_k.weight                     | BF16   | [4096, 4096]
layers.15.attention.wv.weight                    -> blk.15.attn_v.weight                     | BF16   | [4096, 4096]
layers.15.attention.wo.weight                    -> blk.15.attn_output.weight                | BF16   | [4096, 4096]
layers.15.feed_forward.w1.weight                 -> blk.15.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.15.feed_forward.w2.weight                 -> blk.15.ffn_down.weight                   | BF16   | [4096, 11008]
layers.15.feed_forward.w3.weight                 -> blk.15.ffn_up.weight                     | BF16   | [11008, 4096]
layers.15.attention_norm.weight                  -> blk.15.attn_norm.weight                  | BF16   | [4096]
layers.15.ffn_norm.weight                        -> blk.15.ffn_norm.weight                   | BF16   | [4096]
layers.16.attention.wq.weight                    -> blk.16.attn_q.weight                     | BF16   | [4096, 4096]
layers.16.attention.wk.weight                    -> blk.16.attn_k.weight                     | BF16   | [4096, 4096]
layers.16.attention.wv.weight                    -> blk.16.attn_v.weight                     | BF16   | [4096, 4096]
layers.16.attention.wo.weight                    -> blk.16.attn_output.weight                | BF16   | [4096, 4096]
layers.16.feed_forward.w1.weight                 -> blk.16.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.16.feed_forward.w2.weight                 -> blk.16.ffn_down.weight                   | BF16   | [4096, 11008]
layers.16.feed_forward.w3.weight                 -> blk.16.ffn_up.weight                     | BF16   | [11008, 4096]
layers.16.attention_norm.weight                  -> blk.16.attn_norm.weight                  | BF16   | [4096]
layers.16.ffn_norm.weight                        -> blk.16.ffn_norm.weight                   | BF16   | [4096]
layers.17.attention.wq.weight                    -> blk.17.attn_q.weight                     | BF16   | [4096, 4096]
layers.17.attention.wk.weight                    -> blk.17.attn_k.weight                     | BF16   | [4096, 4096]
layers.17.attention.wv.weight                    -> blk.17.attn_v.weight                     | BF16   | [4096, 4096]
layers.17.attention.wo.weight                    -> blk.17.attn_output.weight                | BF16   | [4096, 4096]
layers.17.feed_forward.w1.weight                 -> blk.17.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.17.feed_forward.w2.weight                 -> blk.17.ffn_down.weight                   | BF16   | [4096, 11008]
layers.17.feed_forward.w3.weight                 -> blk.17.ffn_up.weight                     | BF16   | [11008, 4096]
layers.17.attention_norm.weight                  -> blk.17.attn_norm.weight                  | BF16   | [4096]
layers.17.ffn_norm.weight                        -> blk.17.ffn_norm.weight                   | BF16   | [4096]
layers.18.attention.wq.weight                    -> blk.18.attn_q.weight                     | BF16   | [4096, 4096]
layers.18.attention.wk.weight                    -> blk.18.attn_k.weight                     | BF16   | [4096, 4096]
layers.18.attention.wv.weight                    -> blk.18.attn_v.weight                     | BF16   | [4096, 4096]
layers.18.attention.wo.weight                    -> blk.18.attn_output.weight                | BF16   | [4096, 4096]
layers.18.feed_forward.w1.weight                 -> blk.18.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.18.feed_forward.w2.weight                 -> blk.18.ffn_down.weight                   | BF16   | [4096, 11008]
layers.18.feed_forward.w3.weight                 -> blk.18.ffn_up.weight                     | BF16   | [11008, 4096]
layers.18.attention_norm.weight                  -> blk.18.attn_norm.weight                  | BF16   | [4096]
layers.18.ffn_norm.weight                        -> blk.18.ffn_norm.weight                   | BF16   | [4096]
layers.19.attention.wq.weight                    -> blk.19.attn_q.weight                     | BF16   | [4096, 4096]
layers.19.attention.wk.weight                    -> blk.19.attn_k.weight                     | BF16   | [4096, 4096]
layers.19.attention.wv.weight                    -> blk.19.attn_v.weight                     | BF16   | [4096, 4096]
layers.19.attention.wo.weight                    -> blk.19.attn_output.weight                | BF16   | [4096, 4096]
layers.19.feed_forward.w1.weight                 -> blk.19.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.19.feed_forward.w2.weight                 -> blk.19.ffn_down.weight                   | BF16   | [4096, 11008]
layers.19.feed_forward.w3.weight                 -> blk.19.ffn_up.weight                     | BF16   | [11008, 4096]
layers.19.attention_norm.weight                  -> blk.19.attn_norm.weight                  | BF16   | [4096]
layers.19.ffn_norm.weight                        -> blk.19.ffn_norm.weight                   | BF16   | [4096]
layers.20.attention.wq.weight                    -> blk.20.attn_q.weight                     | BF16   | [4096, 4096]
layers.20.attention.wk.weight                    -> blk.20.attn_k.weight                     | BF16   | [4096, 4096]
layers.20.attention.wv.weight                    -> blk.20.attn_v.weight                     | BF16   | [4096, 4096]
layers.20.attention.wo.weight                    -> blk.20.attn_output.weight                | BF16   | [4096, 4096]
layers.20.feed_forward.w1.weight                 -> blk.20.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.20.feed_forward.w2.weight                 -> blk.20.ffn_down.weight                   | BF16   | [4096, 11008]
layers.20.feed_forward.w3.weight                 -> blk.20.ffn_up.weight                     | BF16   | [11008, 4096]
layers.20.attention_norm.weight                  -> blk.20.attn_norm.weight                  | BF16   | [4096]
layers.20.ffn_norm.weight                        -> blk.20.ffn_norm.weight                   | BF16   | [4096]
layers.21.attention.wq.weight                    -> blk.21.attn_q.weight                     | BF16   | [4096, 4096]
layers.21.attention.wk.weight                    -> blk.21.attn_k.weight                     | BF16   | [4096, 4096]
layers.21.attention.wv.weight                    -> blk.21.attn_v.weight                     | BF16   | [4096, 4096]
layers.21.attention.wo.weight                    -> blk.21.attn_output.weight                | BF16   | [4096, 4096]
layers.21.feed_forward.w1.weight                 -> blk.21.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.21.feed_forward.w2.weight                 -> blk.21.ffn_down.weight                   | BF16   | [4096, 11008]
layers.21.feed_forward.w3.weight                 -> blk.21.ffn_up.weight                     | BF16   | [11008, 4096]
layers.21.attention_norm.weight                  -> blk.21.attn_norm.weight                  | BF16   | [4096]
layers.21.ffn_norm.weight                        -> blk.21.ffn_norm.weight                   | BF16   | [4096]
layers.22.attention.wq.weight                    -> blk.22.attn_q.weight                     | BF16   | [4096, 4096]
layers.22.attention.wk.weight                    -> blk.22.attn_k.weight                     | BF16   | [4096, 4096]
layers.22.attention.wv.weight                    -> blk.22.attn_v.weight                     | BF16   | [4096, 4096]
layers.22.attention.wo.weight                    -> blk.22.attn_output.weight                | BF16   | [4096, 4096]
layers.22.feed_forward.w1.weight                 -> blk.22.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.22.feed_forward.w2.weight                 -> blk.22.ffn_down.weight                   | BF16   | [4096, 11008]
layers.22.feed_forward.w3.weight                 -> blk.22.ffn_up.weight                     | BF16   | [11008, 4096]
layers.22.attention_norm.weight                  -> blk.22.attn_norm.weight                  | BF16   | [4096]
layers.22.ffn_norm.weight                        -> blk.22.ffn_norm.weight                   | BF16   | [4096]
layers.23.attention.wq.weight                    -> blk.23.attn_q.weight                     | BF16   | [4096, 4096]
layers.23.attention.wk.weight                    -> blk.23.attn_k.weight                     | BF16   | [4096, 4096]
layers.23.attention.wv.weight                    -> blk.23.attn_v.weight                     | BF16   | [4096, 4096]
layers.23.attention.wo.weight                    -> blk.23.attn_output.weight                | BF16   | [4096, 4096]
layers.23.feed_forward.w1.weight                 -> blk.23.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.23.feed_forward.w2.weight                 -> blk.23.ffn_down.weight                   | BF16   | [4096, 11008]
layers.23.feed_forward.w3.weight                 -> blk.23.ffn_up.weight                     | BF16   | [11008, 4096]
layers.23.attention_norm.weight                  -> blk.23.attn_norm.weight                  | BF16   | [4096]
layers.23.ffn_norm.weight                        -> blk.23.ffn_norm.weight                   | BF16   | [4096]
layers.24.attention.wq.weight                    -> blk.24.attn_q.weight                     | BF16   | [4096, 4096]
layers.24.attention.wk.weight                    -> blk.24.attn_k.weight                     | BF16   | [4096, 4096]
layers.24.attention.wv.weight                    -> blk.24.attn_v.weight                     | BF16   | [4096, 4096]
layers.24.attention.wo.weight                    -> blk.24.attn_output.weight                | BF16   | [4096, 4096]
layers.24.feed_forward.w1.weight                 -> blk.24.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.24.feed_forward.w2.weight                 -> blk.24.ffn_down.weight                   | BF16   | [4096, 11008]
layers.24.feed_forward.w3.weight                 -> blk.24.ffn_up.weight                     | BF16   | [11008, 4096]
layers.24.attention_norm.weight                  -> blk.24.attn_norm.weight                  | BF16   | [4096]
layers.24.ffn_norm.weight                        -> blk.24.ffn_norm.weight                   | BF16   | [4096]
layers.25.attention.wq.weight                    -> blk.25.attn_q.weight                     | BF16   | [4096, 4096]
layers.25.attention.wk.weight                    -> blk.25.attn_k.weight                     | BF16   | [4096, 4096]
layers.25.attention.wv.weight                    -> blk.25.attn_v.weight                     | BF16   | [4096, 4096]
layers.25.attention.wo.weight                    -> blk.25.attn_output.weight                | BF16   | [4096, 4096]
layers.25.feed_forward.w1.weight                 -> blk.25.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.25.feed_forward.w2.weight                 -> blk.25.ffn_down.weight                   | BF16   | [4096, 11008]
layers.25.feed_forward.w3.weight                 -> blk.25.ffn_up.weight                     | BF16   | [11008, 4096]
layers.25.attention_norm.weight                  -> blk.25.attn_norm.weight                  | BF16   | [4096]
layers.25.ffn_norm.weight                        -> blk.25.ffn_norm.weight                   | BF16   | [4096]
layers.26.attention.wq.weight                    -> blk.26.attn_q.weight                     | BF16   | [4096, 4096]
layers.26.attention.wk.weight                    -> blk.26.attn_k.weight                     | BF16   | [4096, 4096]
layers.26.attention.wv.weight                    -> blk.26.attn_v.weight                     | BF16   | [4096, 4096]
layers.26.attention.wo.weight                    -> blk.26.attn_output.weight                | BF16   | [4096, 4096]
layers.26.feed_forward.w1.weight                 -> blk.26.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.26.feed_forward.w2.weight                 -> blk.26.ffn_down.weight                   | BF16   | [4096, 11008]
layers.26.feed_forward.w3.weight                 -> blk.26.ffn_up.weight                     | BF16   | [11008, 4096]
layers.26.attention_norm.weight                  -> blk.26.attn_norm.weight                  | BF16   | [4096]
layers.26.ffn_norm.weight                        -> blk.26.ffn_norm.weight                   | BF16   | [4096]
layers.27.attention.wq.weight                    -> blk.27.attn_q.weight                     | BF16   | [4096, 4096]
layers.27.attention.wk.weight                    -> blk.27.attn_k.weight                     | BF16   | [4096, 4096]
layers.27.attention.wv.weight                    -> blk.27.attn_v.weight                     | BF16   | [4096, 4096]
layers.27.attention.wo.weight                    -> blk.27.attn_output.weight                | BF16   | [4096, 4096]
layers.27.feed_forward.w1.weight                 -> blk.27.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.27.feed_forward.w2.weight                 -> blk.27.ffn_down.weight                   | BF16   | [4096, 11008]
layers.27.feed_forward.w3.weight                 -> blk.27.ffn_up.weight                     | BF16   | [11008, 4096]
layers.27.attention_norm.weight                  -> blk.27.attn_norm.weight                  | BF16   | [4096]
layers.27.ffn_norm.weight                        -> blk.27.ffn_norm.weight                   | BF16   | [4096]
layers.28.attention.wq.weight                    -> blk.28.attn_q.weight                     | BF16   | [4096, 4096]
layers.28.attention.wk.weight                    -> blk.28.attn_k.weight                     | BF16   | [4096, 4096]
layers.28.attention.wv.weight                    -> blk.28.attn_v.weight                     | BF16   | [4096, 4096]
layers.28.attention.wo.weight                    -> blk.28.attn_output.weight                | BF16   | [4096, 4096]
layers.28.feed_forward.w1.weight                 -> blk.28.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.28.feed_forward.w2.weight                 -> blk.28.ffn_down.weight                   | BF16   | [4096, 11008]
layers.28.feed_forward.w3.weight                 -> blk.28.ffn_up.weight                     | BF16   | [11008, 4096]
layers.28.attention_norm.weight                  -> blk.28.attn_norm.weight                  | BF16   | [4096]
layers.28.ffn_norm.weight                        -> blk.28.ffn_norm.weight                   | BF16   | [4096]
layers.29.attention.wq.weight                    -> blk.29.attn_q.weight                     | BF16   | [4096, 4096]
layers.29.attention.wk.weight                    -> blk.29.attn_k.weight                     | BF16   | [4096, 4096]
layers.29.attention.wv.weight                    -> blk.29.attn_v.weight                     | BF16   | [4096, 4096]
layers.29.attention.wo.weight                    -> blk.29.attn_output.weight                | BF16   | [4096, 4096]
layers.29.feed_forward.w1.weight                 -> blk.29.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.29.feed_forward.w2.weight                 -> blk.29.ffn_down.weight                   | BF16   | [4096, 11008]
layers.29.feed_forward.w3.weight                 -> blk.29.ffn_up.weight                     | BF16   | [11008, 4096]
layers.29.attention_norm.weight                  -> blk.29.attn_norm.weight                  | BF16   | [4096]
layers.29.ffn_norm.weight                        -> blk.29.ffn_norm.weight                   | BF16   | [4096]
layers.30.attention.wq.weight                    -> blk.30.attn_q.weight                     | BF16   | [4096, 4096]
layers.30.attention.wk.weight                    -> blk.30.attn_k.weight                     | BF16   | [4096, 4096]
layers.30.attention.wv.weight                    -> blk.30.attn_v.weight                     | BF16   | [4096, 4096]
layers.30.attention.wo.weight                    -> blk.30.attn_output.weight                | BF16   | [4096, 4096]
layers.30.feed_forward.w1.weight                 -> blk.30.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.30.feed_forward.w2.weight                 -> blk.30.ffn_down.weight                   | BF16   | [4096, 11008]
layers.30.feed_forward.w3.weight                 -> blk.30.ffn_up.weight                     | BF16   | [11008, 4096]
layers.30.attention_norm.weight                  -> blk.30.attn_norm.weight                  | BF16   | [4096]
layers.30.ffn_norm.weight                        -> blk.30.ffn_norm.weight                   | BF16   | [4096]
layers.31.attention.wq.weight                    -> blk.31.attn_q.weight                     | BF16   | [4096, 4096]
layers.31.attention.wk.weight                    -> blk.31.attn_k.weight                     | BF16   | [4096, 4096]
layers.31.attention.wv.weight                    -> blk.31.attn_v.weight                     | BF16   | [4096, 4096]
layers.31.attention.wo.weight                    -> blk.31.attn_output.weight                | BF16   | [4096, 4096]
layers.31.feed_forward.w1.weight                 -> blk.31.ffn_gate.weight                   | BF16   | [11008, 4096]
layers.31.feed_forward.w2.weight                 -> blk.31.ffn_down.weight                   | BF16   | [4096, 11008]
layers.31.feed_forward.w3.weight                 -> blk.31.ffn_up.weight                     | BF16   | [11008, 4096]
layers.31.attention_norm.weight                  -> blk.31.attn_norm.weight                  | BF16   | [4096]
layers.31.ffn_norm.weight                        -> blk.31.ffn_norm.weight                   | BF16   | [4096]
skipping tensor rope_freqs
Writing ../llama/llama-2-7b-chat/ggml-model-f16.gguf, format 1
Traceback (most recent call last):
  File "/Users/apple/OSS/llama.cpp/convert.py", line 1206, in <module>
    main()
  File "/Users/apple/OSS/llama.cpp/convert.py", line 1201, in main
    OutputFile.write_all(outfile, ftype, params, model, vocab, special_vocab, concurrency = args.concurrency, endianess=endianess)
  File "/Users/apple/OSS/llama.cpp/convert.py", line 907, in write_all
    check_vocab_size(params, vocab)
  File "/Users/apple/OSS/llama.cpp/convert.py", line 794, in check_vocab_size
    raise Exception(msg)
Exception: Vocab size mismatch (model has -1, but ../llama/tokenizer.model has 32000).
@glemiron
Copy link

I have the same exact problem
Apple M1 Pro
Ventura 13.6.2

@glemiron
Copy link

I've opened a month old version and everything works fine, so it's definitely a recent bug 1e0e873

@TortoiseHam
Copy link
Contributor

@PeterWrighten , @glemiron , if you go into your llama2 model directly and edit the params.json "vocab_size" to be 32000 rather than -1 does it work for you?

@PeterWrighten
Copy link
Author

@PeterWrighten , @glemiron , if you go into your llama2 model directly and edit the params.json "vocab_size" to be 32000 rather than -1 does it work for you?

Thanks! That works well!

@jbohnslav
Copy link

I get the same bug, but in Docker on WSL.

@nonoesp
Copy link

nonoesp commented Nov 21, 2023

Changing vocab_size from -1 to 32000 in params.json fixed it for me.

Thanks @TortoiseHam! 👌🏻

@hyperbolic-c
Copy link

@PeterWrighten , @glemiron , if you go into your llama2 model directly and edit the params.json "vocab_size" to be 32000 rather than -1 does it work for you?

@TortoiseHam hello, I got a same error when I tried to quantize the DeepSeek-coder model with llama.cpp
Exception: Vocab size mismatch (model has 32256, but ../DeepSeek-Coder/models/deepseek-coder-1.3b-instruct has 32022). Add the --pad-vocab option and try again.

then, I edit the size in the config.json file, which fixed the convert and quantize step. But when load the model, got another error llama_model_load: error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected 2048, 32022, got 2048, 32256, 1, 1
I'd appreciate it if you had some advice. Thanks a lot.

@wchli
Copy link

wchli commented Apr 27, 2024

@hyperbolic-c I have also met the same problem, have you solved now?

@hyperbolic-c
Copy link

@hyperbolic-c I have also met the same problem, have you solved now?

No, I am looking forward to supporting Deepseek model #5981.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants