Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue on WSL2? #30

Closed
alexeyvolkoff opened this issue Oct 19, 2024 · 5 comments
Closed

Performance issue on WSL2? #30

alexeyvolkoff opened this issue Oct 19, 2024 · 5 comments

Comments

@alexeyvolkoff
Copy link

I got it running, but its performance on the WSL2 setup (i7-8565U 1.80GHz CPU, 16GB RAM) is nearly unusable - 3 to 5 seconds per word, 40% CPU load. Are there any compiler optimization missing?

@alexeyvolkoff
Copy link
Author

python3 run_inference.py -m Llama3-8B-1.58-100B-tokens-TQ2_0.gguf -p "Explain to me the second Newton's law" -n 20
0 -t 8 -temp 0.8
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz)
warning: not compiled with GPU offload support, --gpu-layers option will be ignored
warning: see main README.md for information on enabling GPU BLAS support
build: 3947 (406a5036) with Ubuntu clang version 14.0.0-1ubuntu1.1 for x86_64-pc-linux-gnu (debug)
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_loader: loaded meta data with 31 key-value pairs and 291 tensors from Llama3-8B-1.58-100B-tokens-TQ2_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Llama3 8B 1.58 100B Tokens
llama_model_loader: - kv 3: general.version str = 1.58
llama_model_loader: - kv 4: general.finetune str = 100b-tokens
llama_model_loader: - kv 5: general.basename str = Llama3
llama_model_loader: - kv 6: general.size_label str = 8B
llama_model_loader: - kv 7: general.base_model.count u32 = 1
llama_model_loader: - kv 8: general.base_model.0.name str = Meta Llama 3 8B Instruct
llama_model_loader: - kv 9: general.base_model.0.organization str = Meta Llama
llama_model_loader: - kv 10: general.base_model.0.repo_url str = https://huggingface.co/meta-llama/Met...
llama_model_loader: - kv 11: llama.block_count u32 = 32
llama_model_loader: - kv 12: llama.context_length u32 = 8192
llama_model_loader: - kv 13: llama.embedding_length u32 = 4096
llama_model_loader: - kv 14: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 15: llama.attention.head_count u32 = 32
llama_model_loader: - kv 16: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 17: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 18: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 19: general.file_type u32 = 37
llama_model_loader: - kv 20: llama.vocab_size u32 = 128256
llama_model_loader: - kv 21: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 22: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 23: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 24: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 25: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 26: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 27: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 128009
llama_model_loader: - kv 29: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv 30: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q4_K: 1 tensors
llama_model_loader: - type q6_K: 1 tensors
llama_model_loader: - type tq2_0: 224 tensors
llm_load_vocab: control token: 128255 '<|reserved_special_token_250|>' is not marked as EOG
llm_load_vocab: control token: 128254 '<|reserved_special_token_249|>' is not marked as EOG
llm_load_vocab: control token: 128253 '<|reserved_special_token_248|>' is not marked as EOG
llm_load_vocab: control token: 128251 '<|reserved_special_token_246|>' is not marked as EOG
llm_load_vocab: control token: 128246 '<|reserved_special_token_241|>' is not marked as EOG
llm_load_vocab: control token: 128243 '<|reserved_special_token_238|>' is not marked as EOG
llm_load_vocab: control token: 128240 '<|reserved_special_token_235|>' is not marked as EOG
llm_load_vocab: control token: 128239 '<|reserved_special_token_234|>' is not marked as EOG
llm_load_vocab: control token: 128238 '<|reserved_special_token_233|>' is not marked as EOG
llm_load_vocab: control token: 128237 '<|reserved_special_token_232|>' is not marked as EOG
llm_load_vocab: control token: 128232 '<|reserved_special_token_227|>' is not marked as EOG
llm_load_vocab: control token: 128228 '<|reserved_special_token_223|>' is not marked as EOG
llm_load_vocab: control token: 128227 '<|reserved_special_token_222|>' is not marked as EOG
llm_load_vocab: control token: 128225 '<|reserved_special_token_220|>' is not marked as EOG
llm_load_vocab: control token: 128222 '<|reserved_special_token_217|>' is not marked as EOG
llm_load_vocab: control token: 128215 '<|reserved_special_token_210|>' is not marked as EOG
llm_load_vocab: control token: 128211 '<|reserved_special_token_206|>' is not marked as EOG
llm_load_vocab: control token: 128210 '<|reserved_special_token_205|>' is not marked as EOG
llm_load_vocab: control token: 128204 '<|reserved_special_token_199|>' is not marked as EOG
llm_load_vocab: control token: 128203 '<|reserved_special_token_198|>' is not marked as EOG
llm_load_vocab: control token: 128201 '<|reserved_special_token_196|>' is not marked as EOG
llm_load_vocab: control token: 128197 '<|reserved_special_token_192|>' is not marked as EOG
llm_load_vocab: control token: 128196 '<|reserved_special_token_191|>' is not marked as EOG
llm_load_vocab: control token: 128195 '<|reserved_special_token_190|>' is not marked as EOG
llm_load_vocab: control token: 128193 '<|reserved_special_token_188|>' is not marked as EOG
llm_load_vocab: control token: 128191 '<|reserved_special_token_186|>' is not marked as EOG
llm_load_vocab: control token: 128190 '<|reserved_special_token_185|>' is not marked as EOG
llm_load_vocab: control token: 128185 '<|reserved_special_token_180|>' is not marked as EOG
llm_load_vocab: control token: 128184 '<|reserved_special_token_179|>' is not marked as EOG
llm_load_vocab: control token: 128182 '<|reserved_special_token_177|>' is not marked as EOG
llm_load_vocab: control token: 128181 '<|reserved_special_token_176|>' is not marked as EOG
llm_load_vocab: control token: 128177 '<|reserved_special_token_172|>' is not marked as EOG
llm_load_vocab: control token: 128176 '<|reserved_special_token_171|>' is not marked as EOG
llm_load_vocab: control token: 128175 '<|reserved_special_token_170|>' is not marked as EOG
llm_load_vocab: control token: 128174 '<|reserved_special_token_169|>' is not marked as EOG
llm_load_vocab: control token: 128173 '<|reserved_special_token_168|>' is not marked as EOG
llm_load_vocab: control token: 128172 '<|reserved_special_token_167|>' is not marked as EOG
llm_load_vocab: control token: 128168 '<|reserved_special_token_163|>' is not marked as EOG
llm_load_vocab: control token: 128167 '<|reserved_special_token_162|>' is not marked as EOG
llm_load_vocab: control token: 128166 '<|reserved_special_token_161|>' is not marked as EOG
llm_load_vocab: control token: 128165 '<|reserved_special_token_160|>' is not marked as EOG
llm_load_vocab: control token: 128162 '<|reserved_special_token_157|>' is not marked as EOG
llm_load_vocab: control token: 128159 '<|reserved_special_token_154|>' is not marked as EOG
llm_load_vocab: control token: 128155 '<|reserved_special_token_150|>' is not marked as EOG
llm_load_vocab: control token: 128153 '<|reserved_special_token_148|>' is not marked as EOG
llm_load_vocab: control token: 128152 '<|reserved_special_token_147|>' is not marked as EOG
llm_load_vocab: control token: 128151 '<|reserved_special_token_146|>' is not marked as EOG
llm_load_vocab: control token: 128148 '<|reserved_special_token_143|>' is not marked as EOG
llm_load_vocab: control token: 128146 '<|reserved_special_token_141|>' is not marked as EOG
llm_load_vocab: control token: 128144 '<|reserved_special_token_139|>' is not marked as EOG
llm_load_vocab: control token: 128143 '<|reserved_special_token_138|>' is not marked as EOG
llm_load_vocab: control token: 128141 '<|reserved_special_token_136|>' is not marked as EOG
llm_load_vocab: control token: 128139 '<|reserved_special_token_134|>' is not marked as EOG
llm_load_vocab: control token: 128138 '<|reserved_special_token_133|>' is not marked as EOG
llm_load_vocab: control token: 128135 '<|reserved_special_token_130|>' is not marked as EOG
llm_load_vocab: control token: 128133 '<|reserved_special_token_128|>' is not marked as EOG
llm_load_vocab: control token: 128132 '<|reserved_special_token_127|>' is not marked as EOG
llm_load_vocab: control token: 128131 '<|reserved_special_token_126|>' is not marked as EOG
llm_load_vocab: control token: 128130 '<|reserved_special_token_125|>' is not marked as EOG
llm_load_vocab: control token: 128128 '<|reserved_special_token_123|>' is not marked as EOG
llm_load_vocab: control token: 128125 '<|reserved_special_token_120|>' is not marked as EOG
llm_load_vocab: control token: 128121 '<|reserved_special_token_116|>' is not marked as EOG
llm_load_vocab: control token: 128120 '<|reserved_special_token_115|>' is not marked as EOG
llm_load_vocab: control token: 128119 '<|reserved_special_token_114|>' is not marked as EOG
llm_load_vocab: control token: 128116 '<|reserved_special_token_111|>' is not marked as EOG
llm_load_vocab: control token: 128112 '<|reserved_special_token_107|>' is not marked as EOG
llm_load_vocab: control token: 128109 '<|reserved_special_token_104|>' is not marked as EOG
llm_load_vocab: control token: 128107 '<|reserved_special_token_102|>' is not marked as EOG
llm_load_vocab: control token: 128106 '<|reserved_special_token_101|>' is not marked as EOG
llm_load_vocab: control token: 128105 '<|reserved_special_token_100|>' is not marked as EOG
llm_load_vocab: control token: 128103 '<|reserved_special_token_98|>' is not marked as EOG
llm_load_vocab: control token: 128100 '<|reserved_special_token_95|>' is not marked as EOG
llm_load_vocab: control token: 128099 '<|reserved_special_token_94|>' is not marked as EOG
llm_load_vocab: control token: 128098 '<|reserved_special_token_93|>' is not marked as EOG
llm_load_vocab: control token: 128094 '<|reserved_special_token_89|>' is not marked as EOG
llm_load_vocab: control token: 128088 '<|reserved_special_token_83|>' is not marked as EOG
llm_load_vocab: control token: 128087 '<|reserved_special_token_82|>' is not marked as EOG
llm_load_vocab: control token: 128086 '<|reserved_special_token_81|>' is not marked as EOG
llm_load_vocab: control token: 128084 '<|reserved_special_token_79|>' is not marked as EOG
llm_load_vocab: control token: 128082 '<|reserved_special_token_77|>' is not marked as EOG
llm_load_vocab: control token: 128078 '<|reserved_special_token_73|>' is not marked as EOG
llm_load_vocab: control token: 128075 '<|reserved_special_token_70|>' is not marked as EOG
llm_load_vocab: control token: 128073 '<|reserved_special_token_68|>' is not marked as EOG
llm_load_vocab: control token: 128072 '<|reserved_special_token_67|>' is not marked as EOG
llm_load_vocab: control token: 128070 '<|reserved_special_token_65|>' is not marked as EOG
llm_load_vocab: control token: 128065 '<|reserved_special_token_60|>' is not marked as EOG
llm_load_vocab: control token: 128064 '<|reserved_special_token_59|>' is not marked as EOG
llm_load_vocab: control token: 128062 '<|reserved_special_token_57|>' is not marked as EOG
llm_load_vocab: control token: 128060 '<|reserved_special_token_55|>' is not marked as EOG
llm_load_vocab: control token: 128059 '<|reserved_special_token_54|>' is not marked as EOG
llm_load_vocab: control token: 128057 '<|reserved_special_token_52|>' is not marked as EOG
llm_load_vocab: control token: 128056 '<|reserved_special_token_51|>' is not marked as EOG
llm_load_vocab: control token: 128054 '<|reserved_special_token_49|>' is not marked as EOG
llm_load_vocab: control token: 128051 '<|reserved_special_token_46|>' is not marked as EOG
llm_load_vocab: control token: 128043 '<|reserved_special_token_38|>' is not marked as EOG
llm_load_vocab: control token: 128042 '<|reserved_special_token_37|>' is not marked as EOG
llm_load_vocab: control token: 128041 '<|reserved_special_token_36|>' is not marked as EOG
llm_load_vocab: control token: 128040 '<|reserved_special_token_35|>' is not marked as EOG
llm_load_vocab: control token: 128035 '<|reserved_special_token_30|>' is not marked as EOG
llm_load_vocab: control token: 128033 '<|reserved_special_token_28|>' is not marked as EOG
llm_load_vocab: control token: 128032 '<|reserved_special_token_27|>' is not marked as EOG
llm_load_vocab: control token: 128029 '<|reserved_special_token_24|>' is not marked as EOG
llm_load_vocab: control token: 128025 '<|reserved_special_token_20|>' is not marked as EOG
llm_load_vocab: control token: 128024 '<|reserved_special_token_19|>' is not marked as EOG
llm_load_vocab: control token: 128021 '<|reserved_special_token_16|>' is not marked as EOG
llm_load_vocab: control token: 128020 '<|reserved_special_token_15|>' is not marked as EOG
llm_load_vocab: control token: 128019 '<|reserved_special_token_14|>' is not marked as EOG
llm_load_vocab: control token: 128018 '<|reserved_special_token_13|>' is not marked as EOG
llm_load_vocab: control token: 128015 '<|reserved_special_token_10|>' is not marked as EOG
llm_load_vocab: control token: 128013 '<|reserved_special_token_8|>' is not marked as EOG
llm_load_vocab: control token: 128012 '<|reserved_special_token_7|>' is not marked as EOG
llm_load_vocab: control token: 128010 '<|reserved_special_token_5|>' is not marked as EOG
llm_load_vocab: control token: 128005 '<|reserved_special_token_3|>' is not marked as EOG
llm_load_vocab: control token: 128004 '<|reserved_special_token_2|>' is not marked as EOG
llm_load_vocab: control token: 128002 '<|reserved_special_token_0|>' is not marked as EOG
llm_load_vocab: control token: 128249 '<|reserved_special_token_244|>' is not marked as EOG
llm_load_vocab: control token: 128187 '<|reserved_special_token_182|>' is not marked as EOG
llm_load_vocab: control token: 128180 '<|reserved_special_token_175|>' is not marked as EOG
llm_load_vocab: control token: 128134 '<|reserved_special_token_129|>' is not marked as EOG
llm_load_vocab: control token: 128179 '<|reserved_special_token_174|>' is not marked as EOG
llm_load_vocab: control token: 128037 '<|reserved_special_token_32|>' is not marked as EOG
llm_load_vocab: control token: 128045 '<|reserved_special_token_40|>' is not marked as EOG
llm_load_vocab: control token: 128089 '<|reserved_special_token_84|>' is not marked as EOG
llm_load_vocab: control token: 128212 '<|reserved_special_token_207|>' is not marked as EOG
llm_load_vocab: control token: 128104 '<|reserved_special_token_99|>' is not marked as EOG
llm_load_vocab: control token: 128205 '<|reserved_special_token_200|>' is not marked as EOG
llm_load_vocab: control token: 128142 '<|reserved_special_token_137|>' is not marked as EOG
llm_load_vocab: control token: 128028 '<|reserved_special_token_23|>' is not marked as EOG
llm_load_vocab: control token: 128126 '<|reserved_special_token_121|>' is not marked as EOG
llm_load_vocab: control token: 128198 '<|reserved_special_token_193|>' is not marked as EOG
llm_load_vocab: control token: 128071 '<|reserved_special_token_66|>' is not marked as EOG
llm_load_vocab: control token: 128092 '<|reserved_special_token_87|>' is not marked as EOG
llm_load_vocab: control token: 128183 '<|reserved_special_token_178|>' is not marked as EOG
llm_load_vocab: control token: 128140 '<|reserved_special_token_135|>' is not marked as EOG
llm_load_vocab: control token: 128226 '<|reserved_special_token_221|>' is not marked as EOG
llm_load_vocab: control token: 128007 '<|end_header_id|>' is not marked as EOG
llm_load_vocab: control token: 128052 '<|reserved_special_token_47|>' is not marked as EOG
llm_load_vocab: control token: 128053 '<|reserved_special_token_48|>' is not marked as EOG
llm_load_vocab: control token: 128058 '<|reserved_special_token_53|>' is not marked as EOG
llm_load_vocab: control token: 128150 '<|reserved_special_token_145|>' is not marked as EOG
llm_load_vocab: control token: 128149 '<|reserved_special_token_144|>' is not marked as EOG
llm_load_vocab: control token: 128209 '<|reserved_special_token_204|>' is not marked as EOG
llm_load_vocab: control token: 128169 '<|reserved_special_token_164|>' is not marked as EOG
llm_load_vocab: control token: 128157 '<|reserved_special_token_152|>' is not marked as EOG
llm_load_vocab: control token: 128038 '<|reserved_special_token_33|>' is not marked as EOG
llm_load_vocab: control token: 128178 '<|reserved_special_token_173|>' is not marked as EOG
llm_load_vocab: control token: 128091 '<|reserved_special_token_86|>' is not marked as EOG
llm_load_vocab: control token: 128115 '<|reserved_special_token_110|>' is not marked as EOG
llm_load_vocab: control token: 128233 '<|reserved_special_token_228|>' is not marked as EOG
llm_load_vocab: control token: 128145 '<|reserved_special_token_140|>' is not marked as EOG
llm_load_vocab: control token: 128039 '<|reserved_special_token_34|>' is not marked as EOG
llm_load_vocab: control token: 128136 '<|reserved_special_token_131|>' is not marked as EOG
llm_load_vocab: control token: 128170 '<|reserved_special_token_165|>' is not marked as EOG
llm_load_vocab: control token: 128236 '<|reserved_special_token_231|>' is not marked as EOG
llm_load_vocab: control token: 128154 '<|reserved_special_token_149|>' is not marked as EOG
llm_load_vocab: control token: 128049 '<|reserved_special_token_44|>' is not marked as EOG
llm_load_vocab: control token: 128023 '<|reserved_special_token_18|>' is not marked as EOG
llm_load_vocab: control token: 128003 '<|reserved_special_token_1|>' is not marked as EOG
llm_load_vocab: control token: 128016 '<|reserved_special_token_11|>' is not marked as EOG
llm_load_vocab: control token: 128113 '<|reserved_special_token_108|>' is not marked as EOG
llm_load_vocab: control token: 128158 '<|reserved_special_token_153|>' is not marked as EOG
llm_load_vocab: control token: 128223 '<|reserved_special_token_218|>' is not marked as EOG
llm_load_vocab: control token: 128156 '<|reserved_special_token_151|>' is not marked as EOG
llm_load_vocab: control token: 128008 '<|reserved_special_token_4|>' is not marked as EOG
llm_load_vocab: control token: 128085 '<|reserved_special_token_80|>' is not marked as EOG
llm_load_vocab: control token: 128160 '<|reserved_special_token_155|>' is not marked as EOG
llm_load_vocab: control token: 128001 '<|end_of_text|>' is not marked as EOG
llm_load_vocab: control token: 128110 '<|reserved_special_token_105|>' is not marked as EOG
llm_load_vocab: control token: 128247 '<|reserved_special_token_242|>' is not marked as EOG
llm_load_vocab: control token: 128122 '<|reserved_special_token_117|>' is not marked as EOG
llm_load_vocab: control token: 128050 '<|reserved_special_token_45|>' is not marked as EOG
llm_load_vocab: control token: 128221 '<|reserved_special_token_216|>' is not marked as EOG
llm_load_vocab: control token: 128244 '<|reserved_special_token_239|>' is not marked as EOG
llm_load_vocab: control token: 128248 '<|reserved_special_token_243|>' is not marked as EOG
llm_load_vocab: control token: 128213 '<|reserved_special_token_208|>' is not marked as EOG
llm_load_vocab: control token: 128006 '<|start_header_id|>' is not marked as EOG
llm_load_vocab: control token: 128208 '<|reserved_special_token_203|>' is not marked as EOG
llm_load_vocab: control token: 128074 '<|reserved_special_token_69|>' is not marked as EOG
llm_load_vocab: control token: 128234 '<|reserved_special_token_229|>' is not marked as EOG
llm_load_vocab: control token: 128083 '<|reserved_special_token_78|>' is not marked as EOG
llm_load_vocab: control token: 128224 '<|reserved_special_token_219|>' is not marked as EOG
llm_load_vocab: control token: 128055 '<|reserved_special_token_50|>' is not marked as EOG
llm_load_vocab: control token: 128097 '<|reserved_special_token_92|>' is not marked as EOG
llm_load_vocab: control token: 128206 '<|reserved_special_token_201|>' is not marked as EOG
llm_load_vocab: control token: 128081 '<|reserved_special_token_76|>' is not marked as EOG
llm_load_vocab: control token: 128068 '<|reserved_special_token_63|>' is not marked as EOG
llm_load_vocab: control token: 128067 '<|reserved_special_token_62|>' is not marked as EOG
llm_load_vocab: control token: 128046 '<|reserved_special_token_41|>' is not marked as EOG
llm_load_vocab: control token: 128194 '<|reserved_special_token_189|>' is not marked as EOG
llm_load_vocab: control token: 128069 '<|reserved_special_token_64|>' is not marked as EOG
llm_load_vocab: control token: 128000 '<|begin_of_text|>' is not marked as EOG
llm_load_vocab: control token: 128220 '<|reserved_special_token_215|>' is not marked as EOG
llm_load_vocab: control token: 128214 '<|reserved_special_token_209|>' is not marked as EOG
llm_load_vocab: control token: 128108 '<|reserved_special_token_103|>' is not marked as EOG
llm_load_vocab: control token: 128200 '<|reserved_special_token_195|>' is not marked as EOG
llm_load_vocab: control token: 128048 '<|reserved_special_token_43|>' is not marked as EOG
llm_load_vocab: control token: 128027 '<|reserved_special_token_22|>' is not marked as EOG
llm_load_vocab: control token: 128114 '<|reserved_special_token_109|>' is not marked as EOG
llm_load_vocab: control token: 128235 '<|reserved_special_token_230|>' is not marked as EOG
llm_load_vocab: control token: 128252 '<|reserved_special_token_247|>' is not marked as EOG
llm_load_vocab: control token: 128199 '<|reserved_special_token_194|>' is not marked as EOG
llm_load_vocab: control token: 128129 '<|reserved_special_token_124|>' is not marked as EOG
llm_load_vocab: control token: 128245 '<|reserved_special_token_240|>' is not marked as EOG
llm_load_vocab: control token: 128164 '<|reserved_special_token_159|>' is not marked as EOG
llm_load_vocab: control token: 128124 '<|reserved_special_token_119|>' is not marked as EOG
llm_load_vocab: control token: 128102 '<|reserved_special_token_97|>' is not marked as EOG
llm_load_vocab: control token: 128036 '<|reserved_special_token_31|>' is not marked as EOG
llm_load_vocab: control token: 128229 '<|reserved_special_token_224|>' is not marked as EOG
llm_load_vocab: control token: 128163 '<|reserved_special_token_158|>' is not marked as EOG
llm_load_vocab: control token: 128127 '<|reserved_special_token_122|>' is not marked as EOG
llm_load_vocab: control token: 128111 '<|reserved_special_token_106|>' is not marked as EOG
llm_load_vocab: control token: 128231 '<|reserved_special_token_226|>' is not marked as EOG
llm_load_vocab: control token: 128188 '<|reserved_special_token_183|>' is not marked as EOG
llm_load_vocab: control token: 128061 '<|reserved_special_token_56|>' is not marked as EOG
llm_load_vocab: control token: 128137 '<|reserved_special_token_132|>' is not marked as EOG
llm_load_vocab: control token: 128093 '<|reserved_special_token_88|>' is not marked as EOG
llm_load_vocab: control token: 128095 '<|reserved_special_token_90|>' is not marked as EOG
llm_load_vocab: control token: 128189 '<|reserved_special_token_184|>' is not marked as EOG
llm_load_vocab: control token: 128090 '<|reserved_special_token_85|>' is not marked as EOG
llm_load_vocab: control token: 128147 '<|reserved_special_token_142|>' is not marked as EOG
llm_load_vocab: control token: 128219 '<|reserved_special_token_214|>' is not marked as EOG
llm_load_vocab: control token: 128230 '<|reserved_special_token_225|>' is not marked as EOG
llm_load_vocab: control token: 128217 '<|reserved_special_token_212|>' is not marked as EOG
llm_load_vocab: control token: 128031 '<|reserved_special_token_26|>' is not marked as EOG
llm_load_vocab: control token: 128030 '<|reserved_special_token_25|>' is not marked as EOG
llm_load_vocab: control token: 128250 '<|reserved_special_token_245|>' is not marked as EOG
llm_load_vocab: control token: 128192 '<|reserved_special_token_187|>' is not marked as EOG
llm_load_vocab: control token: 128096 '<|reserved_special_token_91|>' is not marked as EOG
llm_load_vocab: control token: 128186 '<|reserved_special_token_181|>' is not marked as EOG
llm_load_vocab: control token: 128207 '<|reserved_special_token_202|>' is not marked as EOG
llm_load_vocab: control token: 128171 '<|reserved_special_token_166|>' is not marked as EOG
llm_load_vocab: control token: 128080 '<|reserved_special_token_75|>' is not marked as EOG
llm_load_vocab: control token: 128077 '<|reserved_special_token_72|>' is not marked as EOG
llm_load_vocab: control token: 128101 '<|reserved_special_token_96|>' is not marked as EOG
llm_load_vocab: control token: 128079 '<|reserved_special_token_74|>' is not marked as EOG
llm_load_vocab: control token: 128216 '<|reserved_special_token_211|>' is not marked as EOG
llm_load_vocab: control token: 128014 '<|reserved_special_token_9|>' is not marked as EOG
llm_load_vocab: control token: 128047 '<|reserved_special_token_42|>' is not marked as EOG
llm_load_vocab: control token: 128202 '<|reserved_special_token_197|>' is not marked as EOG
llm_load_vocab: control token: 128044 '<|reserved_special_token_39|>' is not marked as EOG
llm_load_vocab: control token: 128161 '<|reserved_special_token_156|>' is not marked as EOG
llm_load_vocab: control token: 128017 '<|reserved_special_token_12|>' is not marked as EOG
llm_load_vocab: control token: 128066 '<|reserved_special_token_61|>' is not marked as EOG
llm_load_vocab: control token: 128242 '<|reserved_special_token_237|>' is not marked as EOG
llm_load_vocab: control token: 128118 '<|reserved_special_token_113|>' is not marked as EOG
llm_load_vocab: control token: 128076 '<|reserved_special_token_71|>' is not marked as EOG
llm_load_vocab: control token: 128034 '<|reserved_special_token_29|>' is not marked as EOG
llm_load_vocab: control token: 128241 '<|reserved_special_token_236|>' is not marked as EOG
llm_load_vocab: control token: 128026 '<|reserved_special_token_21|>' is not marked as EOG
llm_load_vocab: control token: 128218 '<|reserved_special_token_213|>' is not marked as EOG
llm_load_vocab: control token: 128063 '<|reserved_special_token_58|>' is not marked as EOG
llm_load_vocab: control token: 128117 '<|reserved_special_token_112|>' is not marked as EOG
llm_load_vocab: control token: 128011 '<|reserved_special_token_6|>' is not marked as EOG
llm_load_vocab: control token: 128022 '<|reserved_special_token_17|>' is not marked as EOG
llm_load_vocab: control token: 128123 '<|reserved_special_token_118|>' is not marked as EOG
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 0.8000 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: ssm_dt_b_c_rms = 0
llm_load_print_meta: model type = 8B
llm_load_print_meta: model ftype = TQ2_0 - 2.06 bpw ternary
llm_load_print_meta: model params = 8.03 B
llm_load_print_meta: model size = 2.35 GiB (2.52 BPW)
llm_load_print_meta: general.name = Llama3 8B 1.58 100B Tokens
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128009 '<|eot_id|>'
llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOG token = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
llm_load_tensors: ggml ctx size = 0.14 MiB
llm_load_tensors: CPU buffer size = 2409.80 MiB
..........................................................................
llama_new_context_with_model: n_batch is less than GGML_KQ_MASK_PAD - increasing to 32
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: n_batch = 32
llama_new_context_with_model: n_ubatch = 32
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 256.00 MiB
llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.49 MiB
llama_new_context_with_model: CPU compute buffer size = 16.16 MiB
llama_new_context_with_model: graph nodes = 1030
llama_new_context_with_model: graph splits = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 8

system_info: n_threads = 8 (n_threads_batch = 8) / 8 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |

sampler seed: 2649544873
sampler params:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> top-k -> tail-free -> typical -> top-p -> min-p -> temp-ext -> softmax -> dist
generate: n_ctx = 2048, n_batch = 1, n_predict = 200, n_keep = 1

Explain to me the second Newton's law of motion in one sentence.
Can someone help me understand the second law of motion? I've been studying it for a few days and I still don't get it.
What is the second Newton's law of motion?The second Newton's law of motion is a law of physics which states that when a body is acted upon by two forces with the same magnitude but opposite directions, the body will be brought to rest at the same time. This means that if two forces are acting on a body, and one of the forces is equal to the magnitude of the other, but opposite in direction, then the body will be brought to rest at the same time.The second Newton's law of motion states that the sum of the forces acting on a body is equal to the mass of the body multiplied by the acceleration of the body. The mass of the body in this equation is the inertial mass of the body, which is a body at rest.
What is the Newton's law of motion

llama_perf_sampler_print: sampling time = 378.61 ms / 210 runs ( 1.80 ms per token, 554.67 tokens per second)
llama_perf_context_print: load time = 3251.41 ms
llama_perf_context_print: prompt eval time = 22899.40 ms / 10 tokens ( 2289.94 ms per token, 0.44 tokens per second)
llama_perf_context_print: eval time = 544179.25 ms / 199 runs ( 2734.57 ms per token, 0.37 tokens per second)
llama_perf_context_print: total time = 567770.70 ms / 209 tokens

@bmerkle
Copy link

bmerkle commented Oct 19, 2024

As far as i can see your clang version is too old. (clang 14 vs clang 18).
I think you have to explicitly install clang 18 (https://askubuntu.com/questions/1508260/how-do-i-install-clang-18-on-ubuntu)

the REAME states clang>=18 as requirement
https://github.com/microsoft/BitNet?tab=readme-ov-file#requirements

your log output says clang 14
build: 3947 (406a5036) with Ubuntu clang version 14.0.0-1ubuntu1.1 for x86_64-pc-linux-gnu (debug)

It would be cool it the install/build process would check this requirement and issue an error.
I have created an issue for that #34

@bmerkle
Copy link

bmerkle commented Oct 19, 2024

I have BitNet running in native Windows10 and via WSL.
There seem to be a performance problem, WSL is one magnitude slower.

Sample:
python run_inference.py -m Llama3-8B-1.58-100B-tokens-TQ2_0.gguf -p "Explain to me the second Newton's law" -n 20 -t 8 -temp 0.8

Windows 10:
llama_perf_sampler_print: sampling time = 1.87 ms / 30 runs ( 0.06 ms per token, 16068.56 tokens per second)
llama_perf_context_print: load time = 1246.32 ms
llama_perf_context_print: prompt eval time = 482.70 ms / 10 tokens ( 48.27 ms per token, 20.72 tokens per second)
llama_perf_context_print: eval time = 922.53 ms / 19 runs ( 48.55 ms per token, 20.60 tokens per second)
llama_perf_context_print: total time = 1417.05 ms / 29 tokens

WSL 2.0:
llama_perf_sampler_print: sampling time = 16.02 ms / 30 runs ( 0.53 ms per token, 1872.66 tokens per second)
llama_perf_context_print: load time = 1970.14 ms
llama_perf_context_print: prompt eval time = 10212.70 ms / 10 tokens ( 1021.27 ms per token, 0.98 tokens per second)
llama_perf_context_print: eval time = 22100.21 ms / 19 runs ( 1163.17 ms per token, 0.86 tokens per second)
llama_perf_context_print: total time = 32358.76 ms / 29 tokens

@alexeyvolkoff
Copy link
Author

Wow! thanks a lot for pinpointing the root of issue!

@alexeyvolkoff
Copy link
Author

Recompiled with clang-18 and got much better performance

python3 run_inference.py -m Llama3-8B-1.58-100B-tokens-TQ2_0.gguf -p "What is the answer to life, universe and eve
rything?\nAnswer:\n" -temp 5 -n 400
warning: not compiled with GPU offload support, --gpu-layers option will be ignored
warning: see main README.md for information on enabling GPU BLAS support
build: 3947 (406a5036) with Ubuntu clang version 18.1.8 (++20240731024944+3b5b5c1ec4a3-1exp120240731145000.144) for x86_64-pc-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_loader: loaded meta data with 31 key-value pairs and 291 tensors from Llama3-8B-1.58-100B-tokens-TQ2_0.gguf (version GGUF V3 (latest))
....
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 2

system_info: n_threads = 2 (n_threads_batch = 2) / 8 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |

sampler seed: 1939064987
sampler params:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 5.000
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> top-k -> tail-free -> typical -> top-p -> min-p -> temp-ext -> softmax -> dist
generate: n_ctx = 2048, n_batch = 1, n_predict = 400, n_keep = 1

What is the answer to life, universe and everything?
Answer:
The answer is 42.
2. Who wrote The Hitchhiker's Guide To The Galaxy?
3. The Hitchhiker's Guide is about:
4. Which book was written by Douglas Adams and was also a popular book series in the US?
The Salmon of Death.
5. Which was not an official part of the series in the Hitchhikers Guide books but has now been established in other parts?
The Infinite Impotency of Dr Who.
6. The book was first written in 1977 and published in 1979:
7. It's called:
The Salmon of the Lake
8. What's the title of the fourth book?
The Salmon and The Stick of Rock?
The Salmon Of Death
9. Who were the authors?
10. What does the answer of 42 represent?|In a major change to its policy, the Food and Drug Administration announced that the use of arsenic, which has long been associated with the potential of causing cancer, is now limited. In an effort to limit the exposure to a chemical which the EPA says is linked to cancer and birth defects in people who eat the chicken and has been associated with diabetes in people who drink the water in Bangladesh. The arsenic, a metal used in batteries and pesticides is no longer found in food which the FDA approves of.
According to FDA commissioner Margaret Hamburg the agency has always taken into account the toxic effects of this chemical when setting the toleration standards for drinking and food. The agency says in order to maintain the limits and standards the agency has taken into account the concerns and will be watching it in the future.
This chemical will be limit to the tolerances to levels in foods that people are used to consuming in their food.
The agency said they will also take in to consideration the tolerances that people in different countries and different parts of the country consume on the arsenic which has a wider toleration of 150 parts per million.
The Food and Drug administration has limited the

llama_perf_sampler_print: sampling time = 53.11 ms / 414 runs ( 0.13 ms per token, 7795.73 tokens per second)
llama_perf_context_print: load time = 733.87 ms
llama_perf_context_print: prompt eval time = 2055.26 ms / 14 tokens ( 146.80 ms per token, 6.81 tokens per second)
llama_perf_context_print: eval time = 61543.51 ms / 399 runs ( 154.24 ms per token, 6.48 tokens per second)
llama_perf_context_print: total time = 63737.82 ms / 413 tokens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants