-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: Unable to run llama3.2 on ipex-llm[cpp] #12598
Comments
I tried Details
intel-gpu-top: 8086:4626 @ /dev/dri/card0 - 1298/1298 MHz; 0% RC6; 13.50/35.05 W
71 irqs/s
ENGINES BUSY MI_SEMA MI_WAIT
Render/3D 99.08% |███████████████████████████████████████████████▋ 0% 0%
Blitter 0.00% | | 0% 0%
Video 0.00% | | 0% 0%
VideoEnhance 0.00% | | 0% 0% Update: It happened with |
I'm on a linux machine, the difference in Device name can be because of RAM installed. Intel specs say:
I only have a single 32GB RAM installed, which could explain the difference in Device Name. |
The error is intermittent! I have been able to run both models every now and then, but most of the time it fails to run with assertion failure. |
I tried
But whether it is an ( |
We find it's a bug in our checking, we are fixing it. |
You can try to update ipex-llm[cpp] to 2.2.0b20241226 tomorrow. I have fixed this bug and tested on a similar device i7-1270P. |
I tried it again by upgrading and calling pip freeze | grep ipex-llm
ipex-llm==2.2.0b20241226 However, I still get the same error. Details
[GIN] 2024/12/27 - 09:48:25 | 200 | 444.146µs | 192.168.100.21 | GET "/api/tags"
time=2024-12-27T09:48:25.713Z level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
time=2024-12-27T09:48:25.714Z level=WARN source=gpu.go:732 msg="unable to locate gpu dependency libraries"
time=2024-12-27T09:48:25.714Z level=WARN source=gpu.go:732 msg="unable to locate gpu dependency libraries"
time=2024-12-27T09:48:25.715Z level=WARN source=gpu.go:732 msg="unable to locate gpu dependency libraries"
time=2024-12-27T09:48:25.722Z level=WARN source=gpu.go:732 msg="unable to locate gpu dependency libraries"
time=2024-12-27T09:48:25.723Z level=WARN source=gpu.go:732 msg="unable to locate gpu dependency libraries"
time=2024-12-27T09:48:25.792Z level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="15.6 GiB" free_swap="8.0 GiB"
time=2024-12-27T09:48:25.792Z level=INFO source=memory.go:356 msg="offload to device" layers.requested=-1 layers.model=29 layers.offload=0 layers.split="" memory.available="[0 B]" memory.gpu_overhead="0 B" memory.required.full="2.2 GiB" memory.required.partial="0 B" memory.required.kv="224.0 MiB" memory.required.allocations="[0 B]" memory.weights.total="1.8 GiB" memory.weights.repeating="1.5 GiB" memory.weights.nonrepeating="308.2 MiB" memory.graph.full="256.5 MiB" memory.graph.partial="570.7 MiB"
time=2024-12-27T09:48:25.793Z level=INFO source=server.go:401 msg="starting llama server" cmd="/tmp/ollama1273445517/runners/ipex_llm/ollama_llama_server --model /root/.ollama/models/blobs/sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff --ctx-size 2048 --batch-size 512 --n-gpu-layers 999 --threads 4 --no-mmap --parallel 1 --port 39835"
time=2024-12-27T09:48:25.793Z level=INFO source=sched.go:449 msg="loaded runners" count=1
time=2024-12-27T09:48:25.793Z level=INFO source=server.go:580 msg="waiting for llama runner to start responding"
time=2024-12-27T09:48:25.793Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
time=2024-12-27T09:48:25.831Z level=INFO source=runner.go:956 msg="starting go runner"
time=2024-12-27T09:48:25.831Z level=INFO source=runner.go:957 msg=system info="AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | cgo(gcc)" threads=4
time=2024-12-27T09:48:25.831Z level=INFO source=.:0 msg="Server listening on 127.0.0.1:39835"
llama_model_loader: loaded meta data with 30 key-value pairs and 255 tensors from /root/.ollama/models/blobs/sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Llama 3.2 3B Instruct
llama_model_loader: - kv 3: general.finetune str = Instruct
llama_model_loader: - kv 4: general.basename str = Llama-3.2
llama_model_loader: - kv 5: general.size_label str = 3B
llama_model_loader: - kv 6: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv 7: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv 8: llama.block_count u32 = 28
llama_model_loader: - kv 9: llama.context_length u32 = 131072
llama_model_loader: - kv 10: llama.embedding_length u32 = 3072
llama_model_loader: - kv 11: llama.feed_forward_length u32 = 8192
llama_model_loader: - kv 12: llama.attention.head_count u32 = 24
llama_model_loader: - kv 13: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 14: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 15: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 16: llama.attention.key_length u32 = 128
llama_model_loader: - kv 17: llama.attention.value_length u32 = 128
llama_model_loader: - kv 18: general.file_type u32 = 15
llama_model_loader: - kv 19: llama.vocab_size u32 = 128256
llama_model_loader: - kv 20: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 22: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 26: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 128009
llama_model_loader: - kv 28: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv 29: general.quantization_version u32 = 2
llama_model_loader: - type f32: 58 tensors
llama_model_loader: - type q4_K: 168 tensors
llama_model_loader: - type q6_K: 29 tensors
time=2024-12-27T09:48:26.045Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model"
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 0.7999 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 131072
llm_load_print_meta: n_embd = 3072
llm_load_print_meta: n_layer = 28
llm_load_print_meta: n_head = 24
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 3
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 8192
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 131072
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: ssm_dt_b_c_rms = 0
llm_load_print_meta: model type = 3B
llm_load_print_meta: model ftype = Q4_K - Medium
llm_load_print_meta: model params = 3.21 B
llm_load_print_meta: model size = 1.87 GiB (5.01 BPW)
llm_load_print_meta: general.name = Llama 3.2 3B Instruct
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128009 '<|eot_id|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
llm_load_print_meta: EOM token = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
ggml_sycl_init: GGML_SYCL_FORCE_MMQ: no
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
llm_load_tensors: ggml ctx size = 0.24 MiB
llm_load_tensors: offloading 28 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 29/29 layers to GPU
llm_load_tensors: SYCL0 buffer size = 1918.36 MiB
llm_load_tensors: SYCL_Host buffer size = 308.23 MiB
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
[SYCL] call ggml_check_sycl
ggml_check_sycl: GGML_SYCL_DEBUG: 0
ggml_check_sycl: GGML_SYCL_F16: no
found 1 SYCL devices:
| | | | |Max | |Max |Global | |
| | | | |compute|Max work|sub |mem | |
|ID| Device Type| Name|Version|units |group |group|size | Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]| Intel UHD Graphics| 1.3| 80| 512| 32| 30747M| 1.3.29735|
llama_kv_cache_init: SYCL0 KV buffer size = 224.00 MiB
llama_new_context_with_model: KV self size = 224.00 MiB, K (f16): 112.00 MiB, V (f16): 112.00 MiB
llama_new_context_with_model: SYCL_Host output buffer size = 0.50 MiB
llama_new_context_with_model: SYCL0 compute buffer size = 256.50 MiB
llama_new_context_with_model: SYCL_Host compute buffer size = 10.01 MiB
llama_new_context_with_model: graph nodes = 790
llama_new_context_with_model: graph splits = 2
time=2024-12-27T09:48:27.992Z level=WARN source=runner.go:894 msg="%s: warming up the model with an empty run - please wait ... " !BADKEY=loadModel
time=2024-12-27T09:48:29.060Z level=INFO source=server.go:619 msg="llama runner started in 3.27 seconds"
ollama_llama_server: /home/runner/_work/llm.cpp/llm.cpp/llm.cpp/bigdl-core-xe/llama_backend/sdp_xmx_kernel.cpp:439: auto ggml_sycl_op_sdp_xmx_casual(fp16 *, fp16 *, fp16 *, fp16 *, fp16 *, float *, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, float *, float, sycl::queue &)::(anonymous class)::operator()() const: Assertion `false' failed.
SIGABRT: abort
PC=0x7081bbe969fc m=3 sigcode=18446744073709551610
signal arrived during cgo execution
goroutine 20 gp=0xc000102a80 m=3 mp=0xc00005b008 [syscall]:
runtime.cgocall(0x5d853c50f820, 0xc000069b48)
runtime/cgocall.go:157 +0x4b fp=0xc000069b20 sp=0xc000069ae8 pc=0x5d853c29044b
ollama/llama/llamafile._Cfunc_llama_decode(0x708144006280, {0x20, 0x7080ed512cf0, 0x0, 0x0, 0x7080ed503040, 0x7080ed503850, 0x7080ed504060, 0x7080ed50f000, 0x0, ...})
_cgo_gotypes.go:548 +0x52 fp=0xc000069b48 sp=0xc000069b20 pc=0x5d853c38d9d2
ollama/llama/llamafile.(*Context).Decode.func1(0x5d853c50b06b?, 0x708144006280?)
ollama/llama/llamafile/llama.go:121 +0xd8 fp=0xc000069c68 sp=0xc000069b48 pc=0x5d853c390098
ollama/llama/llamafile.(*Context).Decode(0xc000069d58?, 0x0?)
ollama/llama/llamafile/llama.go:121 +0x13 fp=0xc000069cb0 sp=0xc000069c68 pc=0x5d853c38ff33
main.(*Server).processBatch(0xc000146120, 0xc0000a6000, 0xc000069f10)
ollama/llama/runner/runner.go:434 +0x24d fp=0xc000069ed0 sp=0xc000069cb0 pc=0x5d853c509d2d
main.(*Server).run(0xc000146120, {0x5d853c816ba0, 0xc000182050})
ollama/llama/runner/runner.go:342 +0x1e5 fp=0xc000069fb8 sp=0xc000069ed0 pc=0x5d853c5097a5
main.main.gowrap2()
ollama/llama/runner/runner.go:995 +0x28 fp=0xc000069fe0 sp=0xc000069fb8 pc=0x5d853c50e828
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc000069fe8 sp=0xc000069fe0 pc=0x5d853c2f8e61
created by main.main in goroutine 1
ollama/llama/runner/runner.go:995 +0xd3e
goroutine 1 gp=0xc0000061c0 m=nil [IO wait]:
runtime.gopark(0x1?, 0xc00003b8e0?, 0x74?, 0x6e?, 0xc00003b8c0?)
runtime/proc.go:402 +0xce fp=0xc00003b860 sp=0xc00003b840 pc=0x5d853c2c708e
runtime.netpollblock(0x10?, 0x3c28fba6?, 0x85?)
runtime/netpoll.go:573 +0xf7 fp=0xc00003b898 sp=0xc00003b860 pc=0x5d853c2bf2d7
internal/poll.runtime_pollWait(0x7081bfc46f50, 0x72)
runtime/netpoll.go:345 +0x85 fp=0xc00003b8b8 sp=0xc00003b898 pc=0x5d853c2f3b25
internal/poll.(*pollDesc).wait(0x3?, 0x7081bf7c0288?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00003b8e0 sp=0xc00003b8b8 pc=0x5d853c343a47
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc00017c080)
internal/poll/fd_unix.go:611 +0x2ac fp=0xc00003b988 sp=0xc00003b8e0 pc=0x5d853c344f0c
net.(*netFD).accept(0xc00017c080)
net/fd_unix.go:172 +0x29 fp=0xc00003ba40 sp=0xc00003b988 pc=0x5d853c3b3b29
net.(*TCPListener).accept(0xc0001481c0)
net/tcpsock_posix.go:159 +0x1e fp=0xc00003ba68 sp=0xc00003ba40 pc=0x5d853c3c485e
net.(*TCPListener).Accept(0xc0001481c0)
net/tcpsock.go:327 +0x30 fp=0xc00003ba98 sp=0xc00003ba68 pc=0x5d853c3c3bb0
net/http.(*onceCloseListener).Accept(0xc000218000?)
<autogenerated>:1 +0x24 fp=0xc00003bab0 sp=0xc00003ba98 pc=0x5d853c4eadc4
net/http.(*Server).Serve(0xc00019a000, {0x5d853c816560, 0xc0001481c0})
net/http/server.go:3260 +0x33e fp=0xc00003bbe0 sp=0xc00003bab0 pc=0x5d853c4e1bde
main.main()
ollama/llama/runner/runner.go:1015 +0x10cd fp=0xc00003bf50 sp=0xc00003bbe0 pc=0x5d853c50e5ad
runtime.main()
runtime/proc.go:271 +0x29d fp=0xc00003bfe0 sp=0xc00003bf50 pc=0x5d853c2c6c5d
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc00003bfe8 sp=0xc00003bfe0 pc=0x5d853c2f8e61
goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:402 +0xce fp=0xc000054fa8 sp=0xc000054f88 pc=0x5d853c2c708e
runtime.goparkunlock(...)
runtime/proc.go:408
runtime.forcegchelper()
runtime/proc.go:326 +0xb8 fp=0xc000054fe0 sp=0xc000054fa8 pc=0x5d853c2c6f18
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc000054fe8 sp=0xc000054fe0 pc=0x5d853c2f8e61
created by runtime.init.6 in goroutine 1
runtime/proc.go:314 +0x1a
goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:402 +0xce fp=0xc000055780 sp=0xc000055760 pc=0x5d853c2c708e
runtime.goparkunlock(...)
runtime/proc.go:408
runtime.bgsweep(0xc000022150)
runtime/mgcsweep.go:278 +0x94 fp=0xc0000557c8 sp=0xc000055780 pc=0x5d853c2b1bd4
runtime.gcenable.gowrap1()
runtime/mgc.go:203 +0x25 fp=0xc0000557e0 sp=0xc0000557c8 pc=0x5d853c2a6705
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc0000557e8 sp=0xc0000557e0 pc=0x5d853c2f8e61
created by runtime.gcenable in goroutine 1
runtime/mgc.go:203 +0x66
goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]:
runtime.gopark(0xc000022150?, 0x5d853c58e1e8?, 0x1?, 0x0?, 0xc000007340?)
runtime/proc.go:402 +0xce fp=0xc000055f78 sp=0xc000055f58 pc=0x5d853c2c708e
runtime.goparkunlock(...)
runtime/proc.go:408
runtime.(*scavengerState).park(0x5d853c9e0680)
runtime/mgcscavenge.go:425 +0x49 fp=0xc000055fa8 sp=0xc000055f78 pc=0x5d853c2af5c9
runtime.bgscavenge(0xc000022150)
runtime/mgcscavenge.go:653 +0x3c fp=0xc000055fc8 sp=0xc000055fa8 pc=0x5d853c2afb5c
runtime.gcenable.gowrap2()
runtime/mgc.go:204 +0x25 fp=0xc000055fe0 sp=0xc000055fc8 pc=0x5d853c2a66a5
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc000055fe8 sp=0xc000055fe0 pc=0x5d853c2f8e61
created by runtime.gcenable in goroutine 1
runtime/mgc.go:204 +0xa5
goroutine 18 gp=0xc000102700 m=nil [finalizer wait]:
runtime.gopark(0xc000054648?, 0x5d853c29a005?, 0xa8?, 0x1?, 0xc0000061c0?)
runtime/proc.go:402 +0xce fp=0xc000054620 sp=0xc000054600 pc=0x5d853c2c708e
runtime.runfinq()
runtime/mfinal.go:194 +0x107 fp=0xc0000547e0 sp=0xc000054620 pc=0x5d853c2a5747
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc0000547e8 sp=0xc0000547e0 pc=0x5d853c2f8e61
created by runtime.createfing in goroutine 1
runtime/mfinal.go:164 +0x3d
goroutine 34 gp=0xc00021e000 m=nil [select]:
runtime.gopark(0xc000265a28?, 0x2?, 0x10?, 0x81?, 0xc0002657ec?)
runtime/proc.go:402 +0xce fp=0xc000265660 sp=0xc000265640 pc=0x5d853c2c708e
runtime.selectgo(0xc000265a28, 0xc0002657e8, 0x20?, 0x0, 0x1?, 0x1)
runtime/select.go:327 +0x725 fp=0xc000265780 sp=0xc000265660 pc=0x5d853c2d8465
main.(*Server).completion(0xc000146120, {0x5d853c816710, 0xc000228540}, 0xc0002205a0)
ollama/llama/runner/runner.go:698 +0xa86 fp=0xc000265ab8 sp=0xc000265780 pc=0x5d853c50bb86
main.(*Server).completion-fm({0x5d853c816710?, 0xc000228540?}, 0x5d853c4e5f0d?)
<autogenerated>:1 +0x36 fp=0xc000265ae8 sp=0xc000265ab8 pc=0x5d853c50f056
net/http.HandlerFunc.ServeHTTP(0xc00011edd0?, {0x5d853c816710?, 0xc000228540?}, 0x10?)
net/http/server.go:2171 +0x29 fp=0xc000265b10 sp=0xc000265ae8 pc=0x5d853c4de9a9
net/http.(*ServeMux).ServeHTTP(0x5d853c29a005?, {0x5d853c816710, 0xc000228540}, 0xc0002205a0)
net/http/server.go:2688 +0x1ad fp=0xc000265b60 sp=0xc000265b10 pc=0x5d853c4e082d
net/http.serverHandler.ServeHTTP({0x5d853c815a60?}, {0x5d853c816710?, 0xc000228540?}, 0x6?)
net/http/server.go:3142 +0x8e fp=0xc000265b90 sp=0xc000265b60 pc=0x5d853c4e184e
net/http.(*conn).serve(0xc000218000, {0x5d853c816b68, 0xc00011cdb0})
net/http/server.go:2044 +0x5e8 fp=0xc000265fb8 sp=0xc000265b90 pc=0x5d853c4dd5e8
net/http.(*Server).Serve.gowrap3()
net/http/server.go:3290 +0x28 fp=0xc000265fe0 sp=0xc000265fb8 pc=0x5d853c4e1fc8
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc000265fe8 sp=0xc000265fe0 pc=0x5d853c2f8e61
created by net/http.(*Server).Serve in goroutine 1
net/http/server.go:3290 +0x4b4
goroutine 40 gp=0xc00021e1c0 m=nil [IO wait]:
runtime.gopark(0x10?, 0x10?, 0xf0?, 0x5?, 0xb?)
runtime/proc.go:402 +0xce fp=0xc0002305a8 sp=0xc000230588 pc=0x5d853c2c708e
runtime.netpollblock(0x5d853c32d5d8?, 0x3c28fba6?, 0x85?)
runtime/netpoll.go:573 +0xf7 fp=0xc0002305e0 sp=0xc0002305a8 pc=0x5d853c2bf2d7
internal/poll.runtime_pollWait(0x7081bfc46e58, 0x72)
runtime/netpoll.go:345 +0x85 fp=0xc000230600 sp=0xc0002305e0 pc=0x5d853c2f3b25
internal/poll.(*pollDesc).wait(0xc000216000?, 0xc00008a041?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000230628 sp=0xc000230600 pc=0x5d853c343a47
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000216000, {0xc00008a041, 0x1, 0x1})
internal/poll/fd_unix.go:164 +0x27a fp=0xc0002306c0 sp=0xc000230628 pc=0x5d853c34459a
net.(*netFD).Read(0xc000216000, {0xc00008a041?, 0xc000230748?, 0x5d853c2f5750?})
net/fd_posix.go:55 +0x25 fp=0xc000230708 sp=0xc0002306c0 pc=0x5d853c3b2a25
net.(*conn).Read(0xc00020e008, {0xc00008a041?, 0x0?, 0x5d853ca409c0?})
net/net.go:185 +0x45 fp=0xc000230750 sp=0xc000230708 pc=0x5d853c3bcce5
net.(*TCPConn).Read(0x5d853c9a3050?, {0xc00008a041?, 0x0?, 0x0?})
<autogenerated>:1 +0x25 fp=0xc000230780 sp=0xc000230750 pc=0x5d853c3c86c5
net/http.(*connReader).backgroundRead(0xc00008a030)
net/http/server.go:681 +0x37 fp=0xc0002307c8 sp=0xc000230780 pc=0x5d853c4d7557
net/http.(*connReader).startBackgroundRead.gowrap2()
net/http/server.go:677 +0x25 fp=0xc0002307e0 sp=0xc0002307c8 pc=0x5d853c4d7485
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc0002307e8 sp=0xc0002307e0 pc=0x5d853c2f8e61
created by net/http.(*connReader).startBackgroundRead in goroutine 34
net/http/server.go:677 +0xba
rax 0x0
rbx 0x70815f400640
rcx 0x7081bbe969fc
rdx 0x6
rdi 0x5c0b
rsi 0x5c0d
rbp 0x5c0d
rsp 0x70815f3fefa0
r8 0x70815f3ff070
r9 0x0
r10 0x8
r11 0x246
r12 0x6
r13 0x16
r14 0x7081be53ed6c
r15 0xffffd556aa790000
rip 0x7081bbe969fc
rflags 0x246
cs 0x33
fs 0x0
gs 0x0
[GIN] 2024/12/27 - 09:48:29 | 200 | 3.3945751s | 192.168.100.21 | POST "/api/chat"
[GIN] 2024/12/27 - 09:48:30 | 200 | 356.158µs | 192.168.100.21 | GET "/api/tags" |
@qiuxin2012 Is there something else that can be looked at? |
Happy New Year, Sorry for late reply. I can reproduce your error, and I'm fixing it. |
HNY @qiuxin2012! NP, this isn't urgent on my end, wanted to followup if this is still an issue in the lib or my system has an issue. Thanks for the update! |
@ajatprabha I have test llama3.2:3b with ipex llm 2.2.0b20250102, it works fine now. Please try again. |
I'm still seeing this same crash with llama3.2:3b on ipex llm 2.2.0b20250116, doesn't seem to be resolved or it regressed back. ** Edit: if I set OLLAMA_INTEL_GPU=1 it forces ollama to use the integrated GPU and I don't see the crash. However, if it is set as OLLAMA_INTEL_GPU=0 (which is the default for the intelanalytics/ipex-llm-inference-cpp-xpu):
I would obviously prefer to be able to use the ARC card instead of the integrated one. |
@tklengyel Could you share your OS and CPU info? |
I wasn't able to verify 2.2.0b20250102 either because when I upgraded I started getting linker errors. I thought of trying with a clean install, haven't got the chance to do it yet. |
I'm trying to run
ollama
on an integrated GPU ofIntel i5-1240P
processor. I followed this doc.Everything is installed okay, however, when I try to run the model, it crashes at runtime.
Attaching the error details:
Details
The CGO call fails with
Installed versions:
Details
Is this a known issue? I didn't see anywhere that this GPU is supported, but I went ahead and gave it a try anyway.
The text was updated successfully, but these errors were encountered: