Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempting to run vLLM on CPU results in an error almost immediately. #12873

Open
HumerousGorgon opened this issue Feb 23, 2025 · 4 comments
Open
Assignees

Comments

@HumerousGorgon
Copy link

Hello!

Basically what the title says! The moment I run 'bash start-vllm-service-sh' it freaks out and spits this out:
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/cpu/entrypoints/openai/api_server.py", line 30, in
from ipex_llm.vllm.cpu.engine import IPEXLLMAsyncLLMEngine as AsyncLLMEngine
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/cpu/engine/init.py", line 16, in
from .engine import IPEXLLMAsyncLLMEngine, IPEXLLMLLMEngine, IPEXLLMClass, run_mp_engine
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/cpu/engine/engine.py", line 24, in
from ipex_llm.vllm.cpu.model_convert import _ipex_llm_convert
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/cpu/model_convert.py", line 20, in
from vllm.model_executor.models.llama import LlamaMLP, LlamaAttention, LlamaForCausalLM
File "/usr/local/lib/python3.11/dist-packages/vllm-0.6.6.post1+cpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/llama.py", line 39, in
from vllm.model_executor.layers.logits_processor import LogitsProcessor
File "/usr/local/lib/python3.11/dist-packages/vllm-0.6.6.post1+cpu-py3.11-linux-x86_64.egg/vllm/model_executor/layers/logits_processor.py", line 11, in
from vllm.model_executor.layers.vocab_parallel_embedding import (
File "/usr/local/lib/python3.11/dist-packages/vllm-0.6.6.post1+cpu-py3.11-linux-x86_64.egg/vllm/model_executor/layers/vocab_parallel_embedding.py", line 136, in
@torch.compile(dynamic=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/init.py", line 2424, in fn
return compile(
^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/init.py", line 2447, in compile
return torch._dynamo.optimize(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/_dynamo/eval_frame.py", line 716, in optimize
return _optimize(rebuild_ctx, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/_dynamo/eval_frame.py", line 790, in _optimize
compiler_config=backend.get_compiler_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/init.py", line 2237, in get_compiler_config
from torch._inductor.compile_fx import get_patched_config_dict
File "/usr/local/lib/python3.11/dist-packages/torch/_inductor/compile_fx.py", line 49, in
from torch._inductor.debug import save_args_for_compile_fx_inner
File "/usr/local/lib/python3.11/dist-packages/torch/_inductor/debug.py", line 26, in
from . import config, ir # noqa: F811, this is needed
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/_inductor/ir.py", line 77, in
from .runtime.hints import ReductionHint
File "/usr/local/lib/python3.11/dist-packages/torch/_inductor/runtime/hints.py", line 36, in
attr_desc_fields = {f.name for f in fields(AttrsDescriptor)}
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/dataclasses.py", line 1246, in fields
raise TypeError('must be called with a dataclass type or instance') from None
TypeError: must be called with a dataclass type or instance
root@neutronserver:/llm# nano start-vllm-service.sh
root@neutronserver:/llm# bash start-vllm-service.sh
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/cpu/entrypoints/openai/api_server.py", line 30, in
from ipex_llm.vllm.cpu.engine import IPEXLLMAsyncLLMEngine as AsyncLLMEngine
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/cpu/engine/init.py", line 16, in
from .engine import IPEXLLMAsyncLLMEngine, IPEXLLMLLMEngine, IPEXLLMClass, run_mp_engine
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/cpu/engine/engine.py", line 24, in
from ipex_llm.vllm.cpu.model_convert import _ipex_llm_convert
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/cpu/model_convert.py", line 20, in
from vllm.model_executor.models.llama import LlamaMLP, LlamaAttention, LlamaForCausalLM
File "/usr/local/lib/python3.11/dist-packages/vllm-0.6.6.post1+cpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/llama.py", line 39, in
from vllm.model_executor.layers.logits_processor import LogitsProcessor
File "/usr/local/lib/python3.11/dist-packages/vllm-0.6.6.post1+cpu-py3.11-linux-x86_64.egg/vllm/model_executor/layers/logits_processor.py", line 11, in
from vllm.model_executor.layers.vocab_parallel_embedding import (
File "/usr/local/lib/python3.11/dist-packages/vllm-0.6.6.post1+cpu-py3.11-linux-x86_64.egg/vllm/model_executor/layers/vocab_parallel_embedding.py", line 136, in
@torch.compile(dynamic=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/init.py", line 2424, in fn
return compile(
^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/init.py", line 2447, in compile
return torch._dynamo.optimize(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/_dynamo/eval_frame.py", line 716, in optimize
return _optimize(rebuild_ctx, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/_dynamo/eval_frame.py", line 790, in _optimize
compiler_config=backend.get_compiler_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/init.py", line 2237, in get_compiler_config
from torch._inductor.compile_fx import get_patched_config_dict
File "/usr/local/lib/python3.11/dist-packages/torch/_inductor/compile_fx.py", line 49, in
from torch._inductor.debug import save_args_for_compile_fx_inner
File "/usr/local/lib/python3.11/dist-packages/torch/_inductor/debug.py", line 26, in
from . import config, ir # noqa: F811, this is needed
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/_inductor/ir.py", line 77, in
from .runtime.hints import ReductionHint
File "/usr/local/lib/python3.11/dist-packages/torch/_inductor/runtime/hints.py", line 36, in
attr_desc_fields = {f.name for f in fields(AttrsDescriptor)}
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/dataclasses.py", line 1246, in fields
raise TypeError('must be called with a dataclass type or instance') from None
TypeError: must be called with a dataclass type or instance
root@neutronserver:/llm# nano start-vllm-service.sh
root@neutronserver:/llm# bash start-vllm-service.sh
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/cpu/entrypoints/openai/api_server.py", line 30, in
from ipex_llm.vllm.cpu.engine import IPEXLLMAsyncLLMEngine as AsyncLLMEngine
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/cpu/engine/init.py", line 16, in
from .engine import IPEXLLMAsyncLLMEngine, IPEXLLMLLMEngine, IPEXLLMClass, run_mp_engine
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/cpu/engine/engine.py", line 24, in
from ipex_llm.vllm.cpu.model_convert import _ipex_llm_convert
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/cpu/model_convert.py", line 20, in
from vllm.model_executor.models.llama import LlamaMLP, LlamaAttention, LlamaForCausalLM
File "/usr/local/lib/python3.11/dist-packages/vllm-0.6.6.post1+cpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/llama.py", line 39, in
from vllm.model_executor.layers.logits_processor import LogitsProcessor
File "/usr/local/lib/python3.11/dist-packages/vllm-0.6.6.post1+cpu-py3.11-linux-x86_64.egg/vllm/model_executor/layers/logits_processor.py", line 11, in
from vllm.model_executor.layers.vocab_parallel_embedding import (
File "/usr/local/lib/python3.11/dist-packages/vllm-0.6.6.post1+cpu-py3.11-linux-x86_64.egg/vllm/model_executor/layers/vocab_parallel_embedding.py", line 136, in
@torch.compile(dynamic=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/init.py", line 2424, in fn
return compile(
^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/init.py", line 2447, in compile
return torch._dynamo.optimize(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/_dynamo/eval_frame.py", line 716, in optimize
return _optimize(rebuild_ctx, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/_dynamo/eval_frame.py", line 790, in _optimize
compiler_config=backend.get_compiler_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/init.py", line 2237, in get_compiler_config
from torch._inductor.compile_fx import get_patched_config_dict
File "/usr/local/lib/python3.11/dist-packages/torch/_inductor/compile_fx.py", line 49, in
from torch._inductor.debug import save_args_for_compile_fx_inner
File "/usr/local/lib/python3.11/dist-packages/torch/_inductor/debug.py", line 26, in
from . import config, ir # noqa: F811, this is needed
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/_inductor/ir.py", line 77, in
from .runtime.hints import ReductionHint
File "/usr/local/lib/python3.11/dist-packages/torch/_inductor/runtime/hints.py", line 36, in
attr_desc_fields = {f.name for f in fields(AttrsDescriptor)}
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/dataclasses.py", line 1246, in fields
raise TypeError('must be called with a dataclass type or instance') from None
TypeError: must be called with a dataclass type or instance

Any help would be greatly appreciated!
Thanks.

@Airren
Copy link

Airren commented Feb 25, 2025

I encountered the same problem.

The configuration:

Image

Image

The crash log:

Image

@gc-fu
Copy link
Contributor

gc-fu commented Feb 25, 2025

Hi, can you try to install pip install triton==3.1.0 and see if this error persists?

@gc-fu
Copy link
Contributor

gc-fu commented Feb 25, 2025

Tomorrow's image will include the fix, or you can fix by executing this command:pip install triton==3.1.0.

@Airren
Copy link

Airren commented Feb 25, 2025

Hi, can you try to install pip install triton==3.1.0 and see if this error persists?

It's worked for me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants