Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker: a770: ollama crashes #12845

Open
tkatila opened this issue Feb 18, 2025 · 4 comments
Open

docker: a770: ollama crashes #12845

tkatila opened this issue Feb 18, 2025 · 4 comments
Assignees

Comments

@tkatila
Copy link

tkatila commented Feb 18, 2025

After updating to the latest intelanalytics/ipex-llm-inference-cpp-xpu (intelanalytics/ipex-llm-inference-cpp-xpu@sha256:21f970942b9621790807a869e5661f3b0df50865d9c07db870acca7fb3cf7539), any model I try to run crashes with similar error.

An image before the oneAPI upgrade worked with llama3.2, llava and qwen2.5-coder.

OS: Ubuntu 24.04.1
Kernel: 6.8.0-52-generic
Hw: Arc A770 16GB

The log is from qwen2.5-coder, but the same happens with llama3.2 and llava at least.

Error:

root@dgpu-test:/llm/ollama# ./ollama run qwen2.5-coder
ggml_sycl_init: GGML_SYCL_FORCE_MMQ:   no
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
pulling manifest
pulling 60e05f210007... 100% ▕███████████████████████████████████████▏ 4.7 GB
pulling 66b9ea09bd5b... 100% ▕███████████████████████████████████████▏   68 B
pulling e94a8ecb9327... 100% ▕███████████████████████████████████████▏ 1.6 KB
pulling 832dd9e00a68... 100% ▕███████████████████████████████████████▏  11 KB
pulling d9bb33f27869... 100% ▕███████████████████████████████████████▏  487 B
verifying sha256 digest
writing manifest
success
time=2025-02-18T16:46:13.155+08:00 level=INFO source=server.go:104 msg="system memory" total="31.3 GiB" free="30.2 GiB" free_swap="4.0 GiB"
time=2025-02-18T16:46:13.155+08:00 level=INFO source=memory.go:356 msg="offload to device" layers.requested=-1 layers.model=29 layers.offload=0 layers.split="" memory.available="[30.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="6.0 GiB" memory.required.partial="0 B" memory.required.kv="896.0 MiB" memory.required.allocations="[6.0 GiB]" memory.weights.total="4.5 GiB" memory.weights.repeating="4.1 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="942.0 MiB" memory.graph.partial="1.1 GiB"
time=2025-02-18T16:46:13.155+08:00 level=INFO source=server.go:392 msg="starting llama server" cmd="/usr/local/lib/python3.11/dist-packages/bigdl/cpp/libs/ollama runner --model /root/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 --ctx-size 16384 --batch-size 512 --n-gpu-layers 999 --threads 4 --no-mmap --parallel 1 --port 44557"
time=2025-02-18T16:46:13.156+08:00 level=INFO source=sched.go:449 msg="loaded runners" count=2
time=2025-02-18T16:46:13.156+08:00 level=INFO source=server.go:571 msg="waiting for llama runner to start responding"
time=2025-02-18T16:46:13.156+08:00 level=INFO source=server.go:605 msg="waiting for server to become available" status="llm server error"
⠋ ggml_sycl_init: GGML_SYCL_FORCE_MMQ:   no
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
time=2025-02-18T16:46:13.288+08:00 level=INFO source=runner.go:967 msg="starting go runner"
time=2025-02-18T16:46:13.288+08:00 level=INFO source=runner.go:968 msg=system info="CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | cgo(gcc)" threads=4
time=2025-02-18T16:46:13.289+08:00 level=INFO source=runner.go:1026 msg="Server listening on 127.0.0.1:44557"
llama_load_model_from_file: using device SYCL0 (Intel(R) Arc(TM) A770 Graphics) - 15473 MiB free
⠙ llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /root/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 Coder 7B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5-Coder
llama_model_loader: - kv   5:                         general.size_label str              = 7B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-C...
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Qwen2.5 Coder 7B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Qwen
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-C...
llama_model_loader: - kv  12:                               general.tags arr[str,6]       = ["code", "codeqwen", "chat", "qwen", ...
llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  14:                          qwen2.block_count u32              = 28
llama_model_loader: - kv  15:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv  16:                     qwen2.embedding_length u32              = 3584
llama_model_loader: - kv  17:                  qwen2.feed_forward_length u32              = 18944
llama_model_loader: - kv  18:                 qwen2.attention.head_count u32              = 28
llama_model_loader: - kv  19:              qwen2.attention.head_count_kv u32              = 4
llama_model_loader: - kv  20:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  21:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  22:                          general.file_type u32              = 15
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  33:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  141 tensors
llama_model_loader: - type q4_K:  169 tensors
llama_model_loader: - type q6_K:   29 tensors
time=2025-02-18T16:46:13.407+08:00 level=INFO source=server.go:605 msg="waiting for server to become available" status="llm server loading model"
⠼ llm_load_vocab: special tokens cache size = 22
llm_load_vocab: token to piece cache size = 0.9310 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = qwen2
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 152064
llm_load_print_meta: n_merges         = 151387
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 3584
llm_load_print_meta: n_layer          = 28
llm_load_print_meta: n_head           = 28
llm_load_print_meta: n_head_kv        = 4
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 7
llm_load_print_meta: n_embd_k_gqa     = 512
llm_load_print_meta: n_embd_v_gqa     = 512
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 18944
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 32768
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 7.62 B
llm_load_print_meta: model size       = 4.36 GiB (4.91 BPW)
llm_load_print_meta: general.name     = Qwen2.5 Coder 7B Instruct
llm_load_print_meta: BOS token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOS token        = 151645 '<|im_end|>'
llm_load_print_meta: EOT token        = 151645 '<|im_end|>'
llm_load_print_meta: PAD token        = 151643 '<|endoftext|>'
llm_load_print_meta: LF token         = 148848 'ÄĬ'
llm_load_print_meta: FIM PRE token    = 151659 '<|fim_prefix|>'
llm_load_print_meta: FIM SUF token    = 151661 '<|fim_suffix|>'
llm_load_print_meta: FIM MID token    = 151660 '<|fim_middle|>'
llm_load_print_meta: FIM PAD token    = 151662 '<|fim_pad|>'
llm_load_print_meta: FIM REP token    = 151663 '<|repo_name|>'
llm_load_print_meta: FIM SEP token    = 151664 '<|file_sep|>'
llm_load_print_meta: EOG token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOG token        = 151645 '<|im_end|>'
llm_load_print_meta: EOG token        = 151662 '<|fim_pad|>'
llm_load_print_meta: EOG token        = 151663 '<|repo_name|>'
llm_load_print_meta: EOG token        = 151664 '<|file_sep|>'
llm_load_print_meta: max token length = 256
⠼ llm_load_tensors: offloading 28 repeating layers to GPU
llm_load_tensors: offloading output layer to GPU
llm_load_tensors: offloaded 29/29 layers to GPU
llm_load_tensors:        SYCL0 model buffer size =  4168.09 MiB
llm_load_tensors:          CPU model buffer size =   292.36 MiB
⠧ llama_new_context_with_model: n_seq_max     = 1
llama_new_context_with_model: n_ctx         = 16384
llama_new_context_with_model: n_ctx_per_seq = 16384
llama_new_context_with_model: n_batch       = 512
llama_new_context_with_model: n_ubatch      = 512
llama_new_context_with_model: flash_attn    = 0
llama_new_context_with_model: freq_base     = 1000000.0
llama_new_context_with_model: freq_scale    = 1
llama_new_context_with_model: n_ctx_per_seq (16384) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
[SYCL] call ggml_check_sycl
ggml_check_sycl: GGML_SYCL_DEBUG: 0
ggml_check_sycl: GGML_SYCL_F16: no
Found 1 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                Intel Arc A770 Graphics|  12.55|    512|    1024|   32| 16225M|     1.6.32224.500000|
llama_kv_cache_init:      SYCL0 KV buffer size =   896.00 MiB
llama_new_context_with_model: KV self size  =  896.00 MiB, K (f16):  448.00 MiB, V (f16):  448.00 MiB
llama_new_context_with_model:  SYCL_Host  output buffer size =     0.59 MiB
llama_new_context_with_model:      SYCL0 compute buffer size =   304.00 MiB
llama_new_context_with_model:  SYCL_Host compute buffer size =    39.01 MiB
llama_new_context_with_model: graph nodes  = 874
llama_new_context_with_model: graph splits = 2
time=2025-02-18T16:46:32.867+08:00 level=WARN source=runner.go:892 msg="%s: warming up the model with an empty run - please wait ... " !BADKEY=loadModel
⠇ time=2025-02-18T16:46:32.980+08:00 level=INFO source=server.go:610 msg="llama runner started in 19.82 seconds"
>>> Hello
llama_load_model_from_file: using device SYCL0 (Intel(R) Arc(TM) A770 Graphics) - 15473 MiB free
llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /root/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 Coder 7B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5-Coder
llama_model_loader: - kv   5:                         general.size_label str              = 7B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-C...
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Qwen2.5 Coder 7B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Qwen
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-C...
llama_model_loader: - kv  12:                               general.tags arr[str,6]       = ["code", "codeqwen", "chat", "qwen", ...
llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  14:                          qwen2.block_count u32              = 28
llama_model_loader: - kv  15:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv  16:                     qwen2.embedding_length u32              = 3584
llama_model_loader: - kv  17:                  qwen2.feed_forward_length u32              = 18944
llama_model_loader: - kv  18:                 qwen2.attention.head_count u32              = 28
llama_model_loader: - kv  19:              qwen2.attention.head_count_kv u32              = 4
llama_model_loader: - kv  20:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  21:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  22:                          general.file_type u32              = 15
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  33:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  141 tensors
llama_model_loader: - type q4_K:  169 tensors
llama_model_loader: - type q6_K:   29 tensors
⠹ llm_load_vocab: special tokens cache size = 22
⠸ llm_load_vocab: token to piece cache size = 0.9310 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = qwen2
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 152064
llm_load_print_meta: n_merges         = 151387
llm_load_print_meta: vocab_only       = 1
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = all F32
llm_load_print_meta: model params     = 7.62 B
llm_load_print_meta: model size       = 4.36 GiB (4.91 BPW)
llm_load_print_meta: general.name     = Qwen2.5 Coder 7B Instruct
llm_load_print_meta: BOS token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOS token        = 151645 '<|im_end|>'
llm_load_print_meta: EOT token        = 151645 '<|im_end|>'
llm_load_print_meta: PAD token        = 151643 '<|endoftext|>'
llm_load_print_meta: LF token         = 148848 'ÄĬ'
llm_load_print_meta: FIM PRE token    = 151659 '<|fim_prefix|>'
llm_load_print_meta: FIM SUF token    = 151661 '<|fim_suffix|>'
llm_load_print_meta: FIM MID token    = 151660 '<|fim_middle|>'
llm_load_print_meta: FIM PAD token    = 151662 '<|fim_pad|>'
llm_load_print_meta: FIM REP token    = 151663 '<|repo_name|>'
llm_load_print_meta: FIM SEP token    = 151664 '<|file_sep|>'
llm_load_print_meta: EOG token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOG token        = 151645 '<|im_end|>'
llm_load_print_meta: EOG token        = 151662 '<|fim_pad|>'
llm_load_print_meta: EOG token        = 151663 '<|repo_name|>'
llm_load_print_meta: EOG token        = 151664 '<|file_sep|>'
llm_load_print_meta: max token length = 256
llama_model_load: vocab only - skipping tensors
SIGILL: illegal instruction
PC=0x70ef0960bc2f m=7 sigcode=2
signal arrived during cgo execution
instruction bytes: 0xf3 0xf 0xc7 0xf8 0x25 0xff 0x3 0x0 0x0 0x48 0x8b 0xd 0xe1 0xc2 0x2a 0x0

goroutine 36 gp=0xc000504700 m=7 mp=0xc000520008 [syscall]:
runtime.cgocall(0x555b6f2584e0, 0xc000569b90)
	runtime/cgocall.go:167 +0x4b fp=0xc000569b68 sp=0xc000569b30 pc=0x555b6e6b754b
ollama/llama/llamafile._Cfunc_llama_decode(0x70ee8686ef00, {0x1e, 0x70ee84029af0, 0x0, 0x0, 0x70ee8402a300, 0x70ee868797d0, 0x70ee86879fe0, 0x70ee840237c0})
	_cgo_gotypes.go:558 +0x4f fp=0xc000569b90 sp=0xc000569b68 pc=0x555b6ea7996f
ollama/llama/llamafile.(*Context).Decode.func1(0x555b6ea886eb?, 0x70ee8686ef00?)
	ollama/llama/llamafile/llama.go:143 +0xf5 fp=0xc000569c80 sp=0xc000569b90 pc=0x555b6ea7c595
ollama/llama/llamafile.(*Context).Decode(0xc000075570?, 0x0?)
	ollama/llama/llamafile/llama.go:143 +0x13 fp=0xc000569cc8 sp=0xc000569c80 pc=0x555b6ea7c413
ollama/llama/runner.(*Server).processBatch(0xc000532120, 0xc000112600, 0xc000075720)
	ollama/llama/runner/runner.go:434 +0x23f fp=0xc000569ee0 sp=0xc000569cc8 pc=0x555b6ea873bf
ollama/llama/runner.(*Server).run(0xc000532120, {0x555b6f8089c0, 0xc00059b1d0})
	ollama/llama/runner/runner.go:342 +0x1d5 fp=0xc000569fb8 sp=0xc000569ee0 pc=0x555b6ea86df5
ollama/llama/runner.Execute.gowrap2()
	ollama/llama/runner/runner.go:1006 +0x28 fp=0xc000569fe0 sp=0xc000569fb8 pc=0x555b6ea8c068
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000569fe8 sp=0xc000569fe0 pc=0x555b6e6c6021
created by ollama/llama/runner.Execute in goroutine 1
	ollama/llama/runner/runner.go:1006 +0xde5

goroutine 1 gp=0xc0000061c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc000121560 sp=0xc000121540 pc=0x555b6e6bdc4e
runtime.netpollblock(0xc0001215b0?, 0x6e654a66?, 0x5b?)
	runtime/netpoll.go:575 +0xf7 fp=0xc000121598 sp=0xc000121560 pc=0x555b6e6818b7
internal/poll.runtime_pollWait(0x70ef0a435680, 0x72)
	runtime/netpoll.go:351 +0x85 fp=0xc0001215b8 sp=0xc000121598 pc=0x555b6e6bcf45
internal/poll.(*pollDesc).wait(0xc000490280?, 0x900000036?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0001215e0 sp=0xc0001215b8 pc=0x555b6e744567
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc000490280)
	internal/poll/fd_unix.go:620 +0x295 fp=0xc000121688 sp=0xc0001215e0 pc=0x555b6e749935
net.(*netFD).accept(0xc000490280)
	net/fd_unix.go:172 +0x29 fp=0xc000121740 sp=0xc000121688 pc=0x555b6e7b2009
net.(*TCPListener).accept(0xc0005a5300)
	net/tcpsock_posix.go:159 +0x1e fp=0xc000121790 sp=0xc000121740 pc=0x555b6e7c7c7e
net.(*TCPListener).Accept(0xc0005a5300)
	net/tcpsock.go:372 +0x30 fp=0xc0001217c0 sp=0xc000121790 pc=0x555b6e7c6b30
net/http.(*onceCloseListener).Accept(0xc000016cf0?)
	<autogenerated>:1 +0x24 fp=0xc0001217d8 sp=0xc0001217c0 pc=0x555b6ea40284
net/http.(*Server).Serve(0xc0005b2d20, {0x555b6f806700, 0xc0005a5300})
	net/http/server.go:3330 +0x30c fp=0xc000121908 sp=0xc0001217d8 pc=0x555b6ea1820c
ollama/llama/runner.Execute({0xc000036130?, 0x0?, 0x0?})
	ollama/llama/runner/runner.go:1027 +0x11a9 fp=0xc000121ca8 sp=0xc000121908 pc=0x555b6ea8bd49
ollama/cmd.NewCLI.func2(0xc000534400?, {0x555b6f25cf9d?, 0x4?, 0x555b6f25cfa1?})
	ollama/cmd/cmd.go:1430 +0x45 fp=0xc000121cd0 sp=0xc000121ca8 pc=0x555b6f257765
github.com/spf13/cobra.(*Command).execute(0xc000530008, {0xc0005b2690, 0xf, 0xf})
	github.com/spf13/[email protected]/command.go:985 +0xaaa fp=0xc000121e58 sp=0xc000121cd0 pc=0x555b6e84b3ea
github.com/spf13/cobra.(*Command).ExecuteC(0xc0004aa308)
	github.com/spf13/[email protected]/command.go:1117 +0x3ff fp=0xc000121f30 sp=0xc000121e58 pc=0x555b6e84bcbf
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/[email protected]/command.go:1041
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/[email protected]/command.go:1034
main.main()
	ollama/main.go:12 +0x4d fp=0xc000121f50 sp=0xc000121f30 pc=0x555b6f257dcd
runtime.main()
	runtime/proc.go:272 +0x29d fp=0xc000121fe0 sp=0xc000121f50 pc=0x555b6e688f5d
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000121fe8 sp=0xc000121fe0 pc=0x555b6e6c6021

goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc000078fa8 sp=0xc000078f88 pc=0x555b6e6bdc4e
runtime.goparkunlock(...)
	runtime/proc.go:430
runtime.forcegchelper()
	runtime/proc.go:337 +0xb8 fp=0xc000078fe0 sp=0xc000078fa8 pc=0x555b6e689298
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000078fe8 sp=0xc000078fe0 pc=0x555b6e6c6021
created by runtime.init.7 in goroutine 1
	runtime/proc.go:325 +0x1a

goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc000079780 sp=0xc000079760 pc=0x555b6e6bdc4e
runtime.goparkunlock(...)
	runtime/proc.go:430
runtime.bgsweep(0xc0000a4000)
	runtime/mgcsweep.go:317 +0xdf fp=0xc0000797c8 sp=0xc000079780 pc=0x555b6e67393f
runtime.gcenable.gowrap1()
	runtime/mgc.go:204 +0x25 fp=0xc0000797e0 sp=0xc0000797c8 pc=0x555b6e667f85
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000797e8 sp=0xc0000797e0 pc=0x555b6e6c6021
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x555b6f403070?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc000079f78 sp=0xc000079f58 pc=0x555b6e6bdc4e
runtime.goparkunlock(...)
	runtime/proc.go:430
runtime.(*scavengerState).park(0x555b6ff9ed80)
	runtime/mgcscavenge.go:425 +0x49 fp=0xc000079fa8 sp=0xc000079f78 pc=0x555b6e671309
runtime.bgscavenge(0xc0000a4000)
	runtime/mgcscavenge.go:658 +0x59 fp=0xc000079fc8 sp=0xc000079fa8 pc=0x555b6e671899
runtime.gcenable.gowrap2()
	runtime/mgc.go:205 +0x25 fp=0xc000079fe0 sp=0xc000079fc8 pc=0x555b6e667f25
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000079fe8 sp=0xc000079fe0 pc=0x555b6e6c6021
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]:
runtime.gopark(0xc000078648?, 0x555b6e65e485?, 0xb0?, 0x1?, 0xc0000061c0?)
	runtime/proc.go:424 +0xce fp=0xc000078620 sp=0xc000078600 pc=0x555b6e6bdc4e
runtime.runfinq()
	runtime/mfinal.go:193 +0x107 fp=0xc0000787e0 sp=0xc000078620 pc=0x555b6e667007
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000787e8 sp=0xc0000787e0 pc=0x555b6e6c6021
created by runtime.createfing in goroutine 1
	runtime/mfinal.go:163 +0x3d

goroutine 6 gp=0xc0001ece00 m=nil [chan receive]:
runtime.gopark(0xc00007a760?, 0x555b6e799685?, 0x40?, 0x28?, 0x555b6f819c00?)
	runtime/proc.go:424 +0xce fp=0xc00007a718 sp=0xc00007a6f8 pc=0x555b6e6bdc4e
runtime.chanrecv(0xc000044310, 0x0, 0x1)
	runtime/chan.go:639 +0x41c fp=0xc00007a790 sp=0xc00007a718 pc=0x555b6e65767c
runtime.chanrecv1(0x0?, 0x0?)
	runtime/chan.go:489 +0x12 fp=0xc00007a7b8 sp=0xc00007a790 pc=0x555b6e657232
runtime.unique_runtime_registerUniqueMapCleanup.func1(...)
	runtime/mgc.go:1781
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
	runtime/mgc.go:1784 +0x2f fp=0xc00007a7e0 sp=0xc00007a7b8 pc=0x555b6e66afef
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00007a7e8 sp=0xc00007a7e0 pc=0x555b6e6c6021
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
	runtime/mgc.go:1779 +0x96

goroutine 7 gp=0xc0001eda40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc00007af38 sp=0xc00007af18 pc=0x555b6e6bdc4e
runtime.gcBgMarkWorker(0xc0000458f0)
	runtime/mgc.go:1412 +0xe9 fp=0xc00007afc8 sp=0xc00007af38 pc=0x555b6e66a2e9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1328 +0x25 fp=0xc00007afe0 sp=0xc00007afc8 pc=0x555b6e66a1c5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00007afe8 sp=0xc00007afe0 pc=0x555b6e6c6021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1328 +0x105

goroutine 18 gp=0xc000104380 m=nil [GC worker (idle)]:
runtime.gopark(0x12bb42237f9?, 0x3?, 0xcf?, 0x6d?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc000074738 sp=0xc000074718 pc=0x555b6e6bdc4e
runtime.gcBgMarkWorker(0xc0000458f0)
	runtime/mgc.go:1412 +0xe9 fp=0xc0000747c8 sp=0xc000074738 pc=0x555b6e66a2e9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1328 +0x25 fp=0xc0000747e0 sp=0xc0000747c8 pc=0x555b6e66a1c5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000747e8 sp=0xc0000747e0 pc=0x555b6e6c6021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1328 +0x105

goroutine 8 gp=0xc0001edc00 m=nil [GC worker (idle)]:
runtime.gopark(0x12bb421ee0e?, 0x3?, 0x26?, 0x3e?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc00007b738 sp=0xc00007b718 pc=0x555b6e6bdc4e
runtime.gcBgMarkWorker(0xc0000458f0)
	runtime/mgc.go:1412 +0xe9 fp=0xc00007b7c8 sp=0xc00007b738 pc=0x555b6e66a2e9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1328 +0x25 fp=0xc00007b7e0 sp=0xc00007b7c8 pc=0x555b6e66a1c5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00007b7e8 sp=0xc00007b7e0 pc=0x555b6e6c6021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1328 +0x105

goroutine 34 gp=0xc000504000 m=nil [GC worker (idle)]:
runtime.gopark(0x12bb421f752?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc00050a738 sp=0xc00050a718 pc=0x555b6e6bdc4e
runtime.gcBgMarkWorker(0xc0000458f0)
	runtime/mgc.go:1412 +0xe9 fp=0xc00050a7c8 sp=0xc00050a738 pc=0x555b6e66a2e9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1328 +0x25 fp=0xc00050a7e0 sp=0xc00050a7c8 pc=0x555b6e66a1c5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00050a7e8 sp=0xc00050a7e0 pc=0x555b6e6c6021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1328 +0x105

goroutine 107 gp=0xc000504540 m=nil [IO wait]:
runtime.gopark(0x555b6e662965?, 0x0?, 0x0?, 0x0?, 0xb?)
	runtime/proc.go:424 +0xce fp=0xc0004c9da8 sp=0xc0004c9d88 pc=0x555b6e6bdc4e
runtime.netpollblock(0x555b6e6e0e78?, 0x6e654a66?, 0x5b?)
	runtime/netpoll.go:575 +0xf7 fp=0xc0004c9de0 sp=0xc0004c9da8 pc=0x555b6e6818b7
internal/poll.runtime_pollWait(0x70ef0a435568, 0x72)
	runtime/netpoll.go:351 +0x85 fp=0xc0004c9e00 sp=0xc0004c9de0 pc=0x555b6e6bcf45
internal/poll.(*pollDesc).wait(0xc000591d00?, 0xc0001c3f61?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0004c9e28 sp=0xc0004c9e00 pc=0x555b6e744567
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000591d00, {0xc0001c3f61, 0x1, 0x1})
	internal/poll/fd_unix.go:165 +0x27a fp=0xc0004c9ec0 sp=0xc0004c9e28 pc=0x555b6e74585a
net.(*netFD).Read(0xc000591d00, {0xc0001c3f61?, 0xc0004c9f48?, 0x555b6e6bf8d0?})
	net/fd_posix.go:55 +0x25 fp=0xc0004c9f08 sp=0xc0004c9ec0 pc=0x555b6e7b0045
net.(*conn).Read(0xc0000e4000, {0xc0001c3f61?, 0x0?, 0x555b6ffc6680?})
	net/net.go:189 +0x45 fp=0xc0004c9f50 sp=0xc0004c9f08 pc=0x555b6e7be645
net.(*TCPConn).Read(0x555b6ff030a0?, {0xc0001c3f61?, 0x0?, 0x0?})
	<autogenerated>:1 +0x25 fp=0xc0004c9f80 sp=0xc0004c9f50 pc=0x555b6e7d1845
net/http.(*connReader).backgroundRead(0xc0001c3f50)
	net/http/server.go:690 +0x37 fp=0xc0004c9fc8 sp=0xc0004c9f80 pc=0x555b6ea0db37
net/http.(*connReader).startBackgroundRead.gowrap2()
	net/http/server.go:686 +0x25 fp=0xc0004c9fe0 sp=0xc0004c9fc8 pc=0x555b6ea0da65
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0004c9fe8 sp=0xc0004c9fe0 pc=0x555b6e6c6021
created by net/http.(*connReader).startBackgroundRead in goroutine 121
	net/http/server.go:686 +0xb6

goroutine 121 gp=0xc000504a80 m=nil [select]:
runtime.gopark(0xc00005da68?, 0x2?, 0xce?, 0x36?, 0xc00005d834?)
	runtime/proc.go:424 +0xce fp=0xc00005d650 sp=0xc00005d630 pc=0x555b6e6bdc4e
runtime.selectgo(0xc00005da68, 0xc00005d830, 0x1e?, 0x0, 0x1?, 0x1)
	runtime/select.go:335 +0x7a5 fp=0xc00005d778 sp=0xc00005d650 pc=0x555b6e69af45
ollama/llama/runner.(*Server).completion(0xc000532120, {0x555b6f806910, 0xc0004b7500}, 0xc00041aa00)
	ollama/llama/runner/runner.go:696 +0xab6 fp=0xc00005dac0 sp=0xc00005d778 pc=0x555b6ea89236
ollama/llama/runner.(*Server).completion-fm({0x555b6f806910?, 0xc0004b7500?}, 0x555b6ea21fe7?)
	<autogenerated>:1 +0x36 fp=0xc00005daf0 sp=0xc00005dac0 pc=0x555b6ea8c916
net/http.HandlerFunc.ServeHTTP(0xc00019e1c0?, {0x555b6f806910?, 0xc0004b7500?}, 0x0?)
	net/http/server.go:2220 +0x29 fp=0xc00005db18 sp=0xc00005daf0 pc=0x555b6ea14809
net/http.(*ServeMux).ServeHTTP(0x555b6e65e485?, {0x555b6f806910, 0xc0004b7500}, 0xc00041aa00)
	net/http/server.go:2747 +0x1ca fp=0xc00005db68 sp=0xc00005db18 pc=0x555b6ea1670a
net/http.serverHandler.ServeHTTP({0x555b6f803510?}, {0x555b6f806910?, 0xc0004b7500?}, 0x6?)
	net/http/server.go:3210 +0x8e fp=0xc00005db98 sp=0xc00005db68 pc=0x555b6ea33c6e
net/http.(*conn).serve(0xc000016cf0, {0x555b6f808988, 0xc0005f0510})
	net/http/server.go:2092 +0x5d0 fp=0xc00005dfb8 sp=0xc00005db98 pc=0x555b6ea131b0
net/http.(*Server).Serve.gowrap3()
	net/http/server.go:3360 +0x28 fp=0xc00005dfe0 sp=0xc00005dfb8 pc=0x555b6ea18608
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00005dfe8 sp=0xc00005dfe0 pc=0x555b6e6c6021
created by net/http.(*Server).Serve in goroutine 1
	net/http/server.go:3360 +0x485

rax    0x0
rbx    0x0
rcx    0x70ee840257c0
rdx    0x70eea89febf0
rdi    0x6
rsi    0x555b8508ace0
rbp    0x70eea89fe960
rsp    0x70eea89fe3f0
r8     0x70ee84025940
r9     0x0
r10    0x70ef0b69bf28
r11    0x70ee84025940
r12    0x70ee84025940
r13    0x70ee84058de8
r14    0x70ee840257c0
r15    0x70ee84025940
rip    0x70ef0960bc2f
rflags 0x10202
cs     0x33
fs     0x0
gs     0x0
Error: POST predict: Post "http://127.0.0.1:44557/completion": EOF
@sgwhat
Copy link
Contributor

sgwhat commented Feb 19, 2025

Could you show me clinfo | grep "Device Name" and your oneapi version? You may also follow https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/install_linux_gpu.md#install-gpu-driver to reinstall intel driver and oneapi.

@tkatila
Copy link
Author

tkatila commented Feb 19, 2025

clinfo:

root@dgpu-test:/# clinfo | grep "Device Name"
  Device Name                                     Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
  Device Name                                     Intel(R) Arc(TM) A770 Graphics
    Device Name                                   Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
    Device Name                                   Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
    Device Name                                   Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
root@dgpu-test:/# 

oneAPI is installed in the container (I haven't installed it myself):

root@dgpu-test:/# apt show intel-oneapi-base-toolkit
Package: intel-oneapi-base-toolkit
Version: 2025.0.1-45
Priority: optional
Section: libs
Maintainer: Intel Corporation <http://www.intel.com/software/products/support>
Installed-Size: unknown
Depends: intel-oneapi-base-toolkit-env-2025.0 (>= 2025.0.1-45), intel-oneapi-base-toolkit-getting-started-2025.0 (>= 2025.0.1-45), intel-oneapi-common-vars (>= 2025.0.0-0), intel-oneapi-common-licensing (>= 2025.0.0-0), intel-oneapi-common-oneapi-vars (>= 2025.0.0-0), intel-oneapi-tlt (>= 2025.0.0-0), intel-oneapi-dpcpp-ct (>= 2025.0.0-0), intel-oneapi-libdpstd-devel-2022.7, intel-oneapi-tbb-devel (>= 2022.0.0-0), intel-oneapi-ccl-devel (>= 2021.14.0-0), intel-oneapi-compiler-dpcpp-cpp (>= 2025.0.0-0), intel-oneapi-dal-devel (>= 2025.0.0-0), intel-oneapi-ipp-devel (>= 2022.0.0-0), intel-oneapi-ippcp-devel (>= 2025.0.0-0), intel-oneapi-mkl-devel (>= 2025.0.0-0), intel-oneapi-advisor (>= 2025.0.0-0), intel-oneapi-vtune (>= 2025.0.0-0), intel-oneapi-dnnl-devel (>= 2025.0.0-0), intel-oneapi-dev-utilities (>= 2025.0.0-0), intel-pti-dev (>= 0.10.0-0)
Download-Size: 2548 B
APT-Manual-Installed: no
APT-Sources: https://apt.repos.intel.com/oneapi all/main amd64 Packages
Description: Intel® oneAPI Base Toolkit

N: There is 1 additional record. Please use the '-a' switch to see it

The OS is Ubuntu 24.04. The driver link doesn't have any sections for 24.04, should I downgrade to 22.04?

@JKlesmith
Copy link

Seeing the exact same issue from the ollama-0.5.4-ipex-llm-2.2.0b20250218-ubuntu.tgz release.

SIGILL: illegal instruction
PC=0x749a53e0bc2f m=10 sigcode=2
signal arrived during cgo execution
instruction bytes: 0xf3 0xf 0xc7 0xf8 0x25 0xff 0x3 0x0 0x0 0x48 0x8b 0xd 0xe1 0xc2 0x2a 0x0

Device Name Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz
Device Name Intel(R) Arc(TM) A770 Graphics
Device Name Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz
Device Name Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz
Device Name Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz

Package: intel-oneapi-base-toolkit
Version: 2025.0.1-45

liquorix 6.12-18.1~bookworm (2025-02-08)

@qiuxin2012
Copy link
Contributor

similar issue: #12844

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants