Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSL / Docker ipex-llm-inference-cpp-xpu:latest SIGSEGV on model load #12592

Open
vladislavdonchev opened this issue Dec 21, 2024 · 2 comments
Open

Comments

@vladislavdonchev
Copy link

vladislavdonchev commented Dec 21, 2024

Hello,

Below is my Alder Lake A770 WSL / Docker setup configuration (2 GPUs):

Windows 11 24H2 (also tested with 23H2)
Latest WHQL 32.0.101.6325
<2 A770 cards confirmed working correctly on host Windows and host WSL Ubuntu 22.04>

$ wsl --version
WSL version: 2.3.26.0
Kernel version: 5.15.167.4-1
WSLg version: 1.0.65
MSRDC version: 1.2.5620
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.26100.1-240331-1435.ge-release
Windows version: 10.0.26100.2605

From the IPEX-LLM container:

# sycl-ls
Warning: ONEAPI_DEVICE_SELECTOR environment variable is set to level_zero:*.
To see the correct device id, please unset ONEAPI_DEVICE_SELECTOR.

[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x56a0] 1.6 [1.3.31294]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Graphics [0x56a0] 1.6 [1.3.31294]

After unset ONEAPI_DEVICE_SELECTOR:

# sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Core(TM) i9-14900F OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Graphics [0x56a0] OpenCL 3.0 NEO  [24.39.31294.12]
[opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) Graphics [0x56a0] OpenCL 3.0 NEO  [24.39.31294.12]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x56a0] 1.6 [1.3.31294]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Graphics [0x56a0] 1.6 [1.3.31294]

# clinfo
...
Platform Name                                   Intel(R) OpenCL Graphics
Number of devices                                 2
  Device Name                                     Intel(R) Graphics [0x56a0]
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 3.0 NEO
  ...

This is the Docker command used for IPEX-LLM:

docker run -d --restart=always --net=bridge --device=/dev/dxg --device=/dev/dri
--name=ipex-llm 
-p 11434:11434 
-v /usr/lib/wsl:/usr/lib/wsl
-v ~/.ollama/models:/root/.ollama/models 
-e PATH=/llm/ollama:<OMITTED FOR BREVITY> 
-e OLLAMA_HOST=0.0.0.0 
-e no_proxy=localhost,127.0.0.1 
-e ZES_ENABLE_SYSMAN=1 
-e ENABLE_GPU=1 
-e OLLAMA_INTEL_GPU=true 
-e ONEAPI_DEVICE_SELECTOR=level_zero:*
-e DEVICE=Arc 
--shm-size="16g" 
--memory="32G"
intelanalytics/ipex-llm-inference-cpp-xpu:latest
bash -c "cd /llm/scripts/ && source ipex-llm-init --gpu --device Arc && bash start-ollama.sh && tail -f /llm/ollama/ollama.log"

Switching the device selection between level_zero:0 / 1 / * doesn't change the below observed behaviour.

Pulling a model with ollama works just fine, but trying to run it results in the following:

[GIN] 2024/12/21 - 14:52:52 | 200 |      18.746µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/12/21 - 14:52:52 | 200 |  715.446465ms |       127.0.0.1 | POST     "/api/pull"
time=2024-12-21T14:52:59.385+08:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
time=2024-12-21T14:52:59.385+08:00 level=WARN source=gpu.go:732 msg="unable to locate gpu dependency libraries"
time=2024-12-21T14:52:59.385+08:00 level=WARN source=gpu.go:732 msg="unable to locate gpu dependency libraries"
time=2024-12-21T14:52:59.385+08:00 level=WARN source=gpu.go:732 msg="unable to locate gpu dependency libraries"
[GIN] 2024/12/21 - 14:52:59 | 200 |      10.148µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/12/21 - 14:52:59 | 200 |    2.748599ms |       127.0.0.1 | POST     "/api/show"
SIGSEGV: segmentation violation
PC=0x7f866495324d m=9 sigcode=1 addr=0x31f
signal arrived during cgo execution

docker_logs.txt
Full log attached. Any hints / ideas on what I might be going wrong are welcome as it's my 3rd day battling this (rookie numbers, I know, but still).

Update:

Confirmed with WSL kernels 6.6.36.6-microsoft-standard-WSL2+ and 5.15.167.4

@vladislavdonchev
Copy link
Author

vladislavdonchev commented Dec 21, 2024

Just noticed the following in WSL Ubuntu host dmesg:

# dmesg | grep "dxg"
[    0.448782] hv_vmbus: registering driver dxgkrnl
[    1.515553] misc dxg: dxgk: dxgkio_is_feature_enabled: Ioctl failed: -22
[    1.516804] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[    1.517071] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[    1.517285] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[    1.517535] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    1.522434] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[    1.522822] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[    1.523059] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[    1.523336] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[   68.871477] misc dxg: dxgk: dxgkio_is_feature_enabled: Ioctl failed: -22
[   68.872849] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[   68.873768] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[   68.874362] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[   68.877523] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[   68.880621] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[   68.881511] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[   68.882114] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[   68.885051] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[   68.893477] misc dxg: dxgk: dxgkio_reserve_gpu_va: Ioctl failed: -75
[   68.894093] misc dxg: dxgk: dxgkio_reserve_gpu_va: Ioctl failed: -75
[   68.896077] misc dxg: dxgk: dxgkio_reserve_gpu_va: Ioctl failed: -75
[   68.896496] misc dxg: dxgk: dxgkio_reserve_gpu_va: Ioctl failed: -75
[   69.507268] misc dxg: dxgk: dxgvmb_send_evict: send_evict failed ffffffb5
[   69.508106] misc dxg: dxgk: dxgkio_evict: Ioctl failed: -75
[   69.508640] misc dxg: dxgk: dxgvmb_send_evict: send_evict failed ffffffb5
[   69.509130] misc dxg: dxgk: dxgkio_evict: Ioctl failed: -75

I'm not sure what to make of this? Windows <> WSL GPU driver incompatibility? Possible kernel issue?

@hzjane
Copy link
Contributor

hzjane commented Dec 23, 2024

@vladislavdonchev You can follow this guide to start docker container on windows wsl and run it again. Maybe some environment setting on your script crash the ollama program like OLLAMA_INTEL_GPU=true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants