Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PI_ERROR_BUILD_PROGRAM_FAILURE error when running Ollama using ipex-llm on 12450H CPU #12597

Open
qadzhang opened this issue Dec 23, 2024 · 6 comments
Assignees

Comments

@qadzhang
Copy link

Hello,

The CPU is 12450H with driver version 32.0.101.6325.
The installed software is ipex-llm[cpp], and the Ollama version is 0.4.6.

The installation was successful, but an error occurred before inference while loading the model.

time=2024-12-23T23:18:56.511+08:00 level=INFO source=routes.go:1248 msg="Listening on 127.0.0.1:11434 (version 0.4.6-ipexllm-20241223)"
time=2024-12-23T23:18:56.511+08:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners=[ipex_llm]

time=2024-12-23T23:09:28.726+08:00 level=INFO source=server.go:619 msg="llama runner started in 3.77 seconds"
The program was built for 1 devices
Build program log for 'Intel(R) UHD Graphics':
-11 (PI_ERROR_BUILD_PROGRAM_FAILURE)Exception caught at file:D:/actions-runner/release-cpp-oneapi_2024_2/_work/llm.cpp/llm.cpp/llama-cpp-bigdl/ggml/src/ggml-sycl.cpp, line:3775

捕获

@qiuxin2012
Copy link
Contributor

Which model are you using?

@qadzhang
Copy link
Author

use qwen2.5:7b

@qiuxin2012

@qiuxin2012
Copy link
Contributor

similar issue: #12598, we are fixing it.

@qiuxin2012
Copy link
Contributor

@qadzhang You can try to update ipex-llm[cpp] to 2.2.0b20241226 tomorrow and try again.

@qadzhang
Copy link
Author

Thank you for your efforts.

I upgraded the version and then tested qwen2.5:7b, qwen2.5:0.5b, qwen2:0.5b, bge-m3, and gemma2:9b.

Among them, qwen2:0.5b and gemma2:9b can run normally, while the other two report errors.

When running qwen2.5:0.5b and qwen2.5:7b, the following errors occur:

Sometimes the error is reported at the beginning, sometimes after executing a few steps, but eventually, the following error will always be reported:

The program was built for 1 devices
Build program log for 'Intel(R) UHD Graphics':
-11 (PI_ERROR_BUILD_PROGRAM_FAILURE)Exception caught at file:D:/actions-runner/release-cpp-oneapi_2024_2/_work/llm.cpp/llm.cpp/llama-cpp-bigdl/ggml/src/ggml-sycl.cpp, line:3781


When running bge-m3, an error is reported as soon as it starts loading, with the following error message:

llama_new_context_with_model: graph splits = 2
time=2024-12-27T01:18:42.734+08:00 level=WARN source=runner.go:894 msg="%s: warming up the model with an empty run - please wait ... " !BADKEY=loadModel
D:\actions-runner\release-cpp-oneapi_2024_2_work\llm.cpp\llm.cpp\llama-cpp-bigdl\src\llama.cpp:17622: GGML_ASSERT(seq_id < n_tokens && "seq_id cannot be larger than n_tokens with pooling_type == CLS or RANK") failed
time=2024-12-27T01:18:42.906+08:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
time=2024-12-27T01:18:43.157+08:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: GGML_ASSERT(seq_id < n_tokens && "seq_id cannot be larger than n_tokens with pooling_type == CLS or RANK") failed"
[GIN] 2024/12/27 - 01:18:43 | 500 | 1.9490714s | 127.0.0.1 | POST "/api/embeddings"


(llm-cpp) C:\Users\zc\llama-cpp>ollama serve
2024/12/27 01:28:07 routes.go:1197: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY:localhost,127.0.0.1 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:D:\ollama_db OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-12-27T01:28:07.908+08:00 level=INFO source=images.go:753 msg="total blobs: 20"
time=2024-12-27T01:28:07.909+08:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.

  • using env: export GIN_MODE=release
  • using code: gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST /api/pull --> ollama/server.(*Server).PullHandler-fm (5 handlers)
[GIN-debug] POST /api/generate --> ollama/server.(*Server).GenerateHandler-fm (5 handlers)
[GIN-debug] POST /api/chat --> ollama/server.(*Server).ChatHandler-fm (5 handlers)
[GIN-debug] POST /api/embed --> ollama/server.(*Server).EmbedHandler-fm (5 handlers)
[GIN-debug] POST /api/embeddings --> ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
[GIN-debug] POST /api/create --> ollama/server.(*Server).CreateHandler-fm (5 handlers)
[GIN-debug] POST /api/push --> ollama/server.(*Server).PushHandler-fm (5 handlers)
[GIN-debug] POST /api/copy --> ollama/server.(*Server).CopyHandler-fm (5 handlers)
[GIN-debug] DELETE /api/delete --> ollama/server.(*Server).DeleteHandler-fm (5 handlers)
[GIN-debug] POST /api/show --> ollama/server.(*Server).ShowHandler-fm (5 handlers)
[GIN-debug] POST /api/blobs/:digest --> ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
[GIN-debug] HEAD /api/blobs/:digest --> ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
[GIN-debug] GET /api/ps --> ollama/server.(*Server).PsHandler-fm (5 handlers)
[GIN-debug] POST /v1/chat/completions --> ollama/server.(*Server).ChatHandler-fm (6 handlers)
[GIN-debug] POST /v1/completions --> ollama/server.(*Server).GenerateHandler-fm (6 handlers)
[GIN-debug] POST /v1/embeddings --> ollama/server.(*Server).EmbedHandler-fm (6 handlers)
[GIN-debug] GET /v1/models --> ollama/server.(*Server).ListHandler-fm (6 handlers)
[GIN-debug] GET /v1/models/:model --> ollama/server.(*Server).ShowHandler-fm (6 handlers)
[GIN-debug] GET / --> ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] GET /api/tags --> ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] GET /api/version --> ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD / --> ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] HEAD /api/tags --> ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] HEAD /api/version --> ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
time=2024-12-27T01:28:07.910+08:00 level=INFO source=routes.go:1248 msg="Listening on 127.0.0.1:11434 (version 0.4.6-ipexllm-20241226)"
time=2024-12-27T01:28:07.910+08:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners=[ipex_llm]


@qiuxin2012

@qiuxin2012
Copy link
Contributor

@qadzhang I have reproduced your error, we will look into it.

@qiuxin2012 qiuxin2012 assigned sgwhat and unassigned leonardozcm Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants