Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent Crashes with Ollama on ARC770 and i7-14700K #12555

Open
sirlegendary opened this issue Dec 16, 2024 · 6 comments
Open

Frequent Crashes with Ollama on ARC770 and i7-14700K #12555

sirlegendary opened this issue Dec 16, 2024 · 6 comments

Comments

@sirlegendary
Copy link

Description

Ollama keeps crashing on my system when I attempt to run models or after interacting with them for 1-2 prompts. Even using q4_K_M, which I understand is designed to be resource-efficient, the crashes persist.

Examples of Crashing Models:

  • llama3.2:3b-instruct-q4_K_M: Crashes after 1-2 prompts.
  • qwen2.5-coder:3b-instruct-q4_K_M: Also crashes after a few prompts.
    Models That Do Not Crash:
  • gemma:2b-instruct-q5_K_M: Runs without issues but is not suitable for my needs.

Additionally, I am unable to run sycl-ls to troubleshoot further. When I attempt it, I receive:

bash: sycl-ls: command not found

Environment

  • GPU: Intel ARC770 (16GB VRAM)
  • CPU: Intel i7-14700K
  • RAM: 64GB
  • OS: Ubuntu 22.04.3 LTS
  • Ollama Version: 0.4.6-ipexllm-20241214

Steps to Reproduce

  • Dockerfile:
FROM docker.io/intelanalytics/ipex-llm-inference-cpp-xpu:latest

ENV ZES_ENABLE_SYSMAN=1
ENV OLLAMA_HOST=0.0.0.0:11434
ENV OLLAMA_KEEP_ALIVE=3600
ENV DEVICE=Arc

RUN mkdir -p /llm/ollama; \
    cd /llm/ollama; \
    init-ollama;

ENV LD_LIBRARY_PATH=".:$LD_LIBRARY_PATH"

WORKDIR /llm/ollama

ENTRYPOINT ["./ollama", "serve"]
  • docker-compose file:
version: "3.9"
services:
  ollama-intel-gpu:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: ollama-intel-gpu
    image: ollama-intel-gpu:latest
    restart: always
    devices:
      - /dev/dri:/dev/dri
      - /dev/dxg:/dev/dxg
    volumes:
      - /usr/lib/wsl:/usr/lib/wsl
      - /tmp/.X11-unix:/tmp/.X11-unix
      - ollama-intel-gpu:/root/.ollama
    environment:
      - DISPLAY=${DISPLAY}
      - PATH=/llm/ollama:$PATH
volumes:
  ollama-intel-gpu: {}
  • Exec into container podman exec -it ollama-intel-gpu /bin/bash
  • Run Ollama with llama3.2:3b-instruct-q4_K_M or qwen2.5-coder:3b-instruct-q4_K_M.
  • Interact with the model (1-2 prompts).
  • Observe a crash.

Logs

ollama_intel_gpu_logs.txt
ollama_intel_gpu_2_logs.txt

@sgwhat
Copy link
Contributor

sgwhat commented Dec 17, 2024

Similar to #12550, you may try the latest version of ollama with pip install --pre --upgrade ipex-llm[cpp].

@sirlegendary
Copy link
Author

Thanks, i have upgraded and still having the same issues:
image

I think it doesnt use my ARC GPU, also i still cant run sycl-ls which used to work fine before.

Here is the logs for after upgrade to 0.4.6-ipexllm-20241216

[ollama-intel-gpu] | [GIN] 2024/12/17 - 17:46:52 | 200 |  3.770372909s |       127.0.0.1 | POST     "/api/generate"
[ollama-intel-gpu] | [GIN] 2024/12/17 - 17:46:55 | 200 |  747.855058ms |       127.0.0.1 | POST     "/api/chat"
llama_model_loader: loaded meta data with 25 key-value pairs and 291 tensors from /root/.ollama/models/blobs/sha256-ff82381e2bea77d91c1b824c7afb83f6fb73e9f7de9dda631bcdbca564aa5435 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Mistral-7B-Instruct-v0.3
llama_model_loader: - kv   2:                          llama.block_count u32              = 32
llama_model_loader: - kv   3:                       llama.context_length u32              = 32768
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:                       llama.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 2
llama_model_loader: - kv  11:                           llama.vocab_size u32              = 32768
llama_model_loader: - kv  12:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,32768]   = ["<unk>", "<s>", "</s>", "[INST]", "[...
llama_model_loader: - kv  16:                      tokenizer.ggml.scores arr[f32,32768]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,32768]   = [2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  20:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  21:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  22:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  23:                    tokenizer.chat_template str              = {{ bos_token }}{% for message in mess...
llama_model_loader: - kv  24:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 771
llm_load_vocab: token to piece cache size = 0.1731 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32768
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 1
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = all F32
llm_load_print_meta: model params     = 7.25 B
llm_load_print_meta: model size       = 3.83 GiB (4.54 BPW)
llm_load_print_meta: general.name     = Mistral-7B-Instruct-v0.3
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 781 '<0x0A>'
llm_load_print_meta: EOG token        = 2 '</s>'
llm_load_print_meta: max token length = 48
llama_model_load: vocab only - skipping tensors
ollama_llama_server: /home/runner/_work/llm.cpp/llm.cpp/llm.cpp/bigdl-core-xe/llama_backend/sdp_xmx_kernel.cpp:439: auto ggml_sycl_op_sdp_xmx_casual(fp16 *, fp16 *, fp16 *, fp16 *, fp16 *, float *, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, float *, float, sycl::queue &)::(anonymous class)::operator()() const: Assertion `false' failed.
SIGABRT: abort
PC=0x7f8afaa429fc m=3 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 20 gp=0xc000102a80 m=3 mp=0xc000091008 [syscall]:
runtime.cgocall(0x564eb310f520, 0xc000206b48)
        runtime/cgocall.go:157 +0x4b fp=0xc000206b20 sp=0xc000206ae8 pc=0x564eb2e9040b
ollama/llama/llamafile._Cfunc_llama_decode(0x7f8a880062d0, {0x41, 0x7f8a880e1c80, 0x0, 0x0, 0x7f8a880e3c90, 0x7f8a880e5ca0, 0x7f8a880e7cb0, 0x7f8a880fbb20, 0x0, ...})
        _cgo_gotypes.go:561 +0x52 fp=0xc000206b48 sp=0xc000206b20 pc=0x564eb2f8d992
ollama/llama/llamafile.(*Context).Decode.func1(0x564eb310af8b?, 0x7f8a880062d0?)
        ollama/llama/llamafile/llama.go:121 +0xd8 fp=0xc000206c68 sp=0xc000206b48 pc=0x564eb2f8ffb8
ollama/llama/llamafile.(*Context).Decode(0xc000206d58?, 0x0?)
        ollama/llama/llamafile/llama.go:121 +0x13 fp=0xc000206cb0 sp=0xc000206c68 pc=0x564eb2f8fe53
main.(*Server).processBatch(0xc000146120, 0xc00022e000, 0xc000206f10)
        ollama/llama/runner/runner.go:434 +0x24d fp=0xc000206ed0 sp=0xc000206cb0 pc=0x564eb3109c4d
main.(*Server).run(0xc000146120, {0x564eb3415e20, 0xc000184050})
        ollama/llama/runner/runner.go:342 +0x1e5 fp=0xc000206fb8 sp=0xc000206ed0 pc=0x564eb31096c5
main.main.gowrap2()
        ollama/llama/runner/runner.go:980 +0x28 fp=0xc000206fe0 sp=0xc000206fb8 pc=0x564eb310e528
runtime.goexit({})
        runtime/asm_amd64.s:1695 +0x1 fp=0xc000206fe8 sp=0xc000206fe0 pc=0x564eb2ef8e21
created by main.main in goroutine 1
        ollama/llama/runner/runner.go:980 +0xd3e

goroutine 1 gp=0xc0000061c0 m=nil [IO wait]:
runtime.gopark(0xc000044a08?, 0x0?, 0xc0?, 0x61?, 0xc000039898?)
        runtime/proc.go:402 +0xce fp=0xc000039860 sp=0xc000039840 pc=0x564eb2ec704e
runtime.netpollblock(0xc0000398f8?, 0xb2e8fb66?, 0x4e?)
        runtime/netpoll.go:573 +0xf7 fp=0xc000039898 sp=0xc000039860 pc=0x564eb2ebf297
internal/poll.runtime_pollWait(0x7f8afd18a780, 0x72)
        runtime/netpoll.go:345 +0x85 fp=0xc0000398b8 sp=0xc000039898 pc=0x564eb2ef3ae5
internal/poll.(*pollDesc).wait(0x3?, 0x1?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000398e0 sp=0xc0000398b8 pc=0x564eb2f43a07
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc00017e080)
        internal/poll/fd_unix.go:611 +0x2ac fp=0xc000039988 sp=0xc0000398e0 pc=0x564eb2f44ecc
net.(*netFD).accept(0xc00017e080)
        net/fd_unix.go:172 +0x29 fp=0xc000039a40 sp=0xc000039988 pc=0x564eb2fb3a49
net.(*TCPListener).accept(0xc0001481c0)
        net/tcpsock_posix.go:159 +0x1e fp=0xc000039a68 sp=0xc000039a40 pc=0x564eb2fc477e
net.(*TCPListener).Accept(0xc0001481c0)
        net/tcpsock.go:327 +0x30 fp=0xc000039a98 sp=0xc000039a68 pc=0x564eb2fc3ad0
net/http.(*onceCloseListener).Accept(0xc00021a480?)
        <autogenerated>:1 +0x24 fp=0xc000039ab0 sp=0xc000039a98 pc=0x564eb30eace4
net/http.(*Server).Serve(0xc00018c000, {0x564eb34157e0, 0xc0001481c0})
        net/http/server.go:3260 +0x33e fp=0xc000039be0 sp=0xc000039ab0 pc=0x564eb30e1afe
main.main()
        ollama/llama/runner/runner.go:1000 +0x10cd fp=0xc000039f50 sp=0xc000039be0 pc=0x564eb310e2ad
runtime.main()
        runtime/proc.go:271 +0x29d fp=0xc000039fe0 sp=0xc000039f50 pc=0x564eb2ec6c1d
runtime.goexit({})
        runtime/asm_amd64.s:1695 +0x1 fp=0xc000039fe8 sp=0xc000039fe0 pc=0x564eb2ef8e21

goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:402 +0xce fp=0xc00008afa8 sp=0xc00008af88 pc=0x564eb2ec704e
runtime.goparkunlock(...)
        runtime/proc.go:408
runtime.forcegchelper()
        runtime/proc.go:326 +0xb8 fp=0xc00008afe0 sp=0xc00008afa8 pc=0x564eb2ec6ed8
runtime.goexit({})
        runtime/asm_amd64.s:1695 +0x1 fp=0xc00008afe8 sp=0xc00008afe0 pc=0x564eb2ef8e21
created by runtime.init.6 in goroutine 1
        runtime/proc.go:314 +0x1a

goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:402 +0xce fp=0xc00008b780 sp=0xc00008b760 pc=0x564eb2ec704e
runtime.goparkunlock(...)
        runtime/proc.go:408
runtime.bgsweep(0xc00001e150)
        runtime/mgcsweep.go:278 +0x94 fp=0xc00008b7c8 sp=0xc00008b780 pc=0x564eb2eb1b94
runtime.gcenable.gowrap1()
        runtime/mgc.go:203 +0x25 fp=0xc00008b7e0 sp=0xc00008b7c8 pc=0x564eb2ea66c5
runtime.goexit({})
        runtime/asm_amd64.s:1695 +0x1 fp=0xc00008b7e8 sp=0xc00008b7e0 pc=0x564eb2ef8e21
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:203 +0x66

goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]:
runtime.gopark(0xc00001e150?, 0x564eb318de40?, 0x1?, 0x0?, 0xc000007340?)
        runtime/proc.go:402 +0xce fp=0xc00008bf78 sp=0xc00008bf58 pc=0x564eb2ec704e
runtime.goparkunlock(...)
        runtime/proc.go:408
runtime.(*scavengerState).park(0x564eb35df660)
        runtime/mgcscavenge.go:425 +0x49 fp=0xc00008bfa8 sp=0xc00008bf78 pc=0x564eb2eaf589
runtime.bgscavenge(0xc00001e150)
        runtime/mgcscavenge.go:653 +0x3c fp=0xc00008bfc8 sp=0xc00008bfa8 pc=0x564eb2eafb1c
runtime.gcenable.gowrap2()
        runtime/mgc.go:204 +0x25 fp=0xc00008bfe0 sp=0xc00008bfc8 pc=0x564eb2ea6665
runtime.goexit({})
        runtime/asm_amd64.s:1695 +0x1 fp=0xc00008bfe8 sp=0xc00008bfe0 pc=0x564eb2ef8e21
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:204 +0xa5

goroutine 18 gp=0xc000102700 m=nil [finalizer wait]:
runtime.gopark(0xc00008a648?, 0x564eb2e99fc5?, 0xa8?, 0x1?, 0xc0000061c0?)
        runtime/proc.go:402 +0xce fp=0xc00008a620 sp=0xc00008a600 pc=0x564eb2ec704e
runtime.runfinq()
        runtime/mfinal.go:194 +0x107 fp=0xc00008a7e0 sp=0xc00008a620 pc=0x564eb2ea5707
runtime.goexit({})
        runtime/asm_amd64.s:1695 +0x1 fp=0xc00008a7e8 sp=0xc00008a7e0 pc=0x564eb2ef8e21
created by runtime.createfing in goroutine 1
        runtime/mfinal.go:164 +0x3d

goroutine 41 gp=0xc000102fc0 m=nil [select]:
runtime.gopark(0xc00016fa28?, 0x2?, 0x10?, 0x81?, 0xc00016f7ec?)
        runtime/proc.go:402 +0xce fp=0xc00016f660 sp=0xc00016f640 pc=0x564eb2ec704e
runtime.selectgo(0xc00016fa28, 0xc00016f7e8, 0x46?, 0x0, 0x1?, 0x1)
        runtime/select.go:327 +0x725 fp=0xc00016f780 sp=0xc00016f660 pc=0x564eb2ed8425
main.(*Server).completion(0xc000146120, {0x564eb3415990, 0xc00017c9a0}, 0xc00014cea0)
        ollama/llama/runner/runner.go:698 +0xa86 fp=0xc00016fab8 sp=0xc00016f780 pc=0x564eb310baa6
main.(*Server).completion-fm({0x564eb3415990?, 0xc00017c9a0?}, 0x564eb30e5e2d?)
        <autogenerated>:1 +0x36 fp=0xc00016fae8 sp=0xc00016fab8 pc=0x564eb310ed56
net/http.HandlerFunc.ServeHTTP(0xc00011eea0?, {0x564eb3415990?, 0xc00017c9a0?}, 0x10?)
        net/http/server.go:2171 +0x29 fp=0xc00016fb10 sp=0xc00016fae8 pc=0x564eb30de8c9
net/http.(*ServeMux).ServeHTTP(0x564eb2e99fc5?, {0x564eb3415990, 0xc00017c9a0}, 0xc00014cea0)
        net/http/server.go:2688 +0x1ad fp=0xc00016fb60 sp=0xc00016fb10 pc=0x564eb30e074d
net/http.serverHandler.ServeHTTP({0x564eb3414ce0?}, {0x564eb3415990?, 0xc00017c9a0?}, 0x6?)
        net/http/server.go:3142 +0x8e fp=0xc00016fb90 sp=0xc00016fb60 pc=0x564eb30e176e
net/http.(*conn).serve(0xc00021a480, {0x564eb3415de8, 0xc00011cdb0})
        net/http/server.go:2044 +0x5e8 fp=0xc00016ffb8 sp=0xc00016fb90 pc=0x564eb30dd508
net/http.(*Server).Serve.gowrap3()
        net/http/server.go:3290 +0x28 fp=0xc00016ffe0 sp=0xc00016ffb8 pc=0x564eb30e1ee8
runtime.goexit({})
        runtime/asm_amd64.s:1695 +0x1 fp=0xc00016ffe8 sp=0xc00016ffe0 pc=0x564eb2ef8e21
created by net/http.(*Server).Serve in goroutine 1
        net/http/server.go:3290 +0x4b4

goroutine 27 gp=0xc000007880 m=nil [IO wait]:
runtime.gopark(0x10?, 0x10?, 0xf0?, 0xc5?, 0xb?)
        runtime/proc.go:402 +0xce fp=0xc00008c5a8 sp=0xc00008c588 pc=0x564eb2ec704e
runtime.netpollblock(0x564eb2f2d598?, 0xb2e8fb66?, 0x4e?)
        runtime/netpoll.go:573 +0xf7 fp=0xc00008c5e0 sp=0xc00008c5a8 pc=0x564eb2ebf297
internal/poll.runtime_pollWait(0x7f8afd18a688, 0x72)
        runtime/netpoll.go:345 +0x85 fp=0xc00008c600 sp=0xc00008c5e0 pc=0x564eb2ef3ae5
internal/poll.(*pollDesc).wait(0xc000232500?, 0xc00020e5e1?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00008c628 sp=0xc00008c600 pc=0x564eb2f43a07
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000232500, {0xc00020e5e1, 0x1, 0x1})
        internal/poll/fd_unix.go:164 +0x27a fp=0xc00008c6c0 sp=0xc00008c628 pc=0x564eb2f4455a
net.(*netFD).Read(0xc000232500, {0xc00020e5e1?, 0xc00008c748?, 0x564eb2ef5710?})
        net/fd_posix.go:55 +0x25 fp=0xc00008c708 sp=0xc00008c6c0 pc=0x564eb2fb2945
net.(*conn).Read(0xc000210010, {0xc00020e5e1?, 0x0?, 0x564eb363f9a0?})
        net/net.go:185 +0x45 fp=0xc00008c750 sp=0xc00008c708 pc=0x564eb2fbcc05
net.(*TCPConn).Read(0x564eb35a2040?, {0xc00020e5e1?, 0x0?, 0x0?})
        <autogenerated>:1 +0x25 fp=0xc00008c780 sp=0xc00008c750 pc=0x564eb2fc85e5
net/http.(*connReader).backgroundRead(0xc00020e5d0)
        net/http/server.go:681 +0x37 fp=0xc00008c7c8 sp=0xc00008c780 pc=0x564eb30d7477
net/http.(*connReader).startBackgroundRead.gowrap2()
        net/http/server.go:677 +0x25 fp=0xc00008c7e0 sp=0xc00008c7c8 pc=0x564eb30d73a5
runtime.goexit({})
        runtime/asm_amd64.s:1695 +0x1 fp=0xc00008c7e8 sp=0xc00008c7e0 pc=0x564eb2ef8e21
created by net/http.(*connReader).startBackgroundRead in goroutine 41
        net/http/server.go:677 +0xba

rax    0x0
rbx    0x7f8a977fe640
rcx    0x7f8afaa429fc
rdx    0x6
rdi    0x5d
rsi    0x5f
rbp    0x5f
rsp    0x7f8a977fcfa0
r8     0x7f8a977fd070
r9     0x0
r10    0x8
r11    0x246
r12    0x6
r13    0x16
r14    0x7f8afcb3eb0c
r15    0xffffb80339800000
rip    0x7f8afaa429fc
rflags 0x246
cs     0x33
fs     0x0
gs     0x0

@sgwhat
Copy link
Contributor

sgwhat commented Dec 18, 2024

Hi @sirlegendary, could you please provide us more information with the following script: #12550 (comment)

@leonardozcm
Copy link
Contributor

We have re-checked and refactored the device decision logic based on your error log, it would be nice if you can try this tomorrow and we can see if it solves your problem.

@sirlegendary
Copy link
Author

Hi @sirlegendary, could you please provide us more information with the following script: #12550 (comment)

The container throws an error when i run the dpcpp command
image

@sgwhat
Copy link
Contributor

sgwhat commented Dec 20, 2024

Hi @sirlegendary , you may try our latest ollama version via pip install --pre --upgrade ipex-llm[cpp] to check if your issue is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants