PI_ERROR_BUILD_PROGRAM_FAILURE error when running Ollama using ipex-llm on 12450H CPU #12597

qadzhang · 2024-12-23T15:19:25Z

Hello,

The CPU is 12450H with driver version 32.0.101.6325.
The installed software is ipex-llm[cpp], and the Ollama version is 0.4.6.

The installation was successful, but an error occurred before inference while loading the model.

time=2024-12-23T23:18:56.511+08:00 level=INFO source=routes.go:1248 msg="Listening on 127.0.0.1:11434 (version 0.4.6-ipexllm-20241223)"
time=2024-12-23T23:18:56.511+08:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners=[ipex_llm]

time=2024-12-23T23:09:28.726+08:00 level=INFO source=server.go:619 msg="llama runner started in 3.77 seconds"
The program was built for 1 devices
Build program log for 'Intel(R) UHD Graphics':
-11 (PI_ERROR_BUILD_PROGRAM_FAILURE)Exception caught at file:D:/actions-runner/release-cpp-oneapi_2024_2/_work/llm.cpp/llm.cpp/llama-cpp-bigdl/ggml/src/ggml-sycl.cpp, line:3775

qiuxin2012 · 2024-12-24T06:44:06Z

Which model are you using?

qadzhang · 2024-12-24T08:46:09Z

use qwen2.5:7b

@qiuxin2012

qiuxin2012 · 2024-12-26T02:01:25Z

similar issue: #12598, we are fixing it.

qiuxin2012 · 2024-12-26T07:15:08Z

@qadzhang You can try to update ipex-llm[cpp] to 2.2.0b20241226 tomorrow and try again.

qadzhang · 2024-12-26T17:28:55Z

Thank you for your efforts.

I upgraded the version and then tested qwen2.5:7b, qwen2.5:0.5b, qwen2:0.5b, bge-m3, and gemma2:9b.

Among them, qwen2:0.5b and gemma2:9b can run normally, while the other two report errors.

When running qwen2.5:0.5b and qwen2.5:7b, the following errors occur:

Sometimes the error is reported at the beginning, sometimes after executing a few steps, but eventually, the following error will always be reported:

The program was built for 1 devices
Build program log for 'Intel(R) UHD Graphics':
-11 (PI_ERROR_BUILD_PROGRAM_FAILURE)Exception caught at file:D:/actions-runner/release-cpp-oneapi_2024_2/_work/llm.cpp/llm.cpp/llama-cpp-bigdl/ggml/src/ggml-sycl.cpp, line:3781

When running bge-m3, an error is reported as soon as it starts loading, with the following error message:

llama_new_context_with_model: graph splits = 2
time=2024-12-27T01:18:42.734+08:00 level=WARN source=runner.go:894 msg="%s: warming up the model with an empty run - please wait ... " !BADKEY=loadModel
D:\actions-runner\release-cpp-oneapi_2024_2_work\llm.cpp\llm.cpp\llama-cpp-bigdl\src\llama.cpp:17622: GGML_ASSERT(seq_id < n_tokens && "seq_id cannot be larger than n_tokens with pooling_type == CLS or RANK") failed
time=2024-12-27T01:18:42.906+08:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
time=2024-12-27T01:18:43.157+08:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: GGML_ASSERT(seq_id < n_tokens && "seq_id cannot be larger than n_tokens with pooling_type == CLS or RANK") failed"
[GIN] 2024/12/27 - 01:18:43 | 500 | 1.9490714s | 127.0.0.1 | POST "/api/embeddings"

(llm-cpp) C:\Users\zc\llama-cpp>ollama serve
2024/12/27 01:28:07 routes.go:1197: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY:localhost,127.0.0.1 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:D:\ollama_db OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-12-27T01:28:07.908+08:00 level=INFO source=images.go:753 msg="total blobs: 20"
time=2024-12-27T01:28:07.909+08:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.

using env: export GIN_MODE=release
using code: gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST /api/pull [GIN-debug] POST /api/generate [GIN-debug] POST /api/chat [GIN-debug] POST /api/embed [GIN-debug] POST /api/embeddings [GIN-debug] POST /api/create [GIN-debug] POST /api/push [GIN-debug] POST /api/copy [GIN-debug] DELETE /api/delete [GIN-debug] POST /api/show [GIN-debug] POST /api/blobs/:digest [GIN-debug] HEAD /api/blobs/:digest [GIN-debug] GET /api/ps [GIN-debug] POST /v1/chat/completions [GIN-debug] POST /v1/completions [GIN-debug] POST /v1/embeddings [GIN-debug] GET /v1/models [GIN-debug] GET /v1/models/:model [GIN-debug] GET / [GIN-debug] GET /api/tags [GIN-debug] GET /api/version [GIN-debug] HEAD / [GIN-debug] HEAD /api/tags [GIN-debug] HEAD /api/version time=2024-12-27T01:28:07.910+08:00 time=2024-12-27T01:28:07.910+08:00 --> ollama/server.(*Server).PullHandler-fm (5 handlers)
--> ollama/server.(*Server).GenerateHandler-fm (5 handlers)
--> ollama/server.(*Server).ChatHandler-fm (5 handlers)
--> ollama/server.(*Server).EmbedHandler-fm (5 handlers)
--> ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
--> ollama/server.(*Server).CreateHandler-fm (5 handlers)
--> ollama/server.(*Server).PushHandler-fm (5 handlers)
--> ollama/server.(*Server).CopyHandler-fm (5 handlers)
--> ollama/server.(*Server).DeleteHandler-fm (5 handlers)
--> ollama/server.(*Server).ShowHandler-fm (5 handlers)
--> ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
--> ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
--> ollama/server.(*Server).PsHandler-fm (5 handlers)
--> ollama/server.(*Server).ChatHandler-fm (6 handlers)
--> ollama/server.(*Server).GenerateHandler-fm (6 handlers)
--> ollama/server.(*Server).EmbedHandler-fm (6 handlers)
--> ollama/server.(*Server).ListHandler-fm (6 handlers)
--> ollama/server.(*Server).ShowHandler-fm (6 handlers)
--> ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
--> ollama/server.(*Server).ListHandler-fm (5 handlers)
--> ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
--> ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
--> ollama/server.(*Server).ListHandler-fm (5 handlers)
--> ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
level=INFO source=routes.go:1248 msg="Listening on 127.0.0.1:11434 (version 0.4.6-ipexllm-20241226)"
level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners=[ipex_llm]

@qiuxin2012

qiuxin2012 · 2024-12-27T07:21:12Z

@qadzhang I have reproduced your error, we will look into it.

qiuxin2012 assigned leonardozcm Dec 24, 2024

leonardozcm assigned qiuxin2012 Dec 24, 2024

qiuxin2012 assigned sgwhat and unassigned leonardozcm Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PI_ERROR_BUILD_PROGRAM_FAILURE error when running Ollama using ipex-llm on 12450H CPU #12597

PI_ERROR_BUILD_PROGRAM_FAILURE error when running Ollama using ipex-llm on 12450H CPU #12597

qadzhang commented Dec 23, 2024

qiuxin2012 commented Dec 24, 2024

qadzhang commented Dec 24, 2024

qiuxin2012 commented Dec 26, 2024

qiuxin2012 commented Dec 26, 2024

qadzhang commented Dec 26, 2024

qiuxin2012 commented Dec 27, 2024

PI_ERROR_BUILD_PROGRAM_FAILURE error when running Ollama using ipex-llm on 12450H CPU #12597

PI_ERROR_BUILD_PROGRAM_FAILURE error when running Ollama using ipex-llm on 12450H CPU #12597

Comments

qadzhang commented Dec 23, 2024

qiuxin2012 commented Dec 24, 2024

qadzhang commented Dec 24, 2024

qiuxin2012 commented Dec 26, 2024

qiuxin2012 commented Dec 26, 2024

qadzhang commented Dec 26, 2024

Among them, qwen2:0.5b and gemma2:9b can run normally, while the other two report errors.

qiuxin2012 commented Dec 27, 2024