[Misc]: How to build on Ascend NPU #1

wangshuai09 · 2024-09-10T07:04:35Z

use dev docker
docker pull ascendai/pytorch:2.1.0-ubuntu22.04
into container
docker run -p 2022:22 --name test-vllm --device /dev/davinci0 --device /dev/davinci_manager --device /dev/devmm_svm --device /dev/hisi_hdc -v /usr/local/dcmi:/usr/local/dcmi -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info -itd ascendai/pytorch:2.1.0-ubuntu22.04 bash
download vllm project

yum install git
pip uninstall torch_npu
git clone https://github.com/wangshuai09/vllm
cd vllm
git chekcout npu_support

install vllm
VLLM_TARGET_DEVICE=npu pip install -e .
test model
python examples/offline_inference_npu.py

Wechat Group

Scan the QR code via WeChat and join the vLLM-NPU 交流群.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

beardog6 · 2024-09-13T08:54:50Z

目前只支持离线推理吗？openai服务接口运行报错：
INFO 09-12 10:08:00 selector.py:237] Cannot use _Backend.FLASH_ATTN backend on NPU.
INFO 09-12 10:08:00 selector.py:161] Using ASCEND_TORCH backend.
Process SpawnProcess-1:
Traceback (most recent call last):
File "/root/miniconda3/envs/Python310/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/root/miniconda3/envs/Python310/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/vllm/vllm/entrypoints/openai/rpc/server.py", line 236, in run_rpc_server
server = AsyncEngineRPCServer(async_engine_args, usage_context, rpc_path)
File "/vllm/vllm/entrypoints/openai/rpc/server.py", line 34, in init
self.engine = AsyncLLMEngine.from_engine_args(
File "/vllm/vllm/engine/async_llm_engine.py", line 735, in from_engine_args
engine = cls(
File "/vllm/vllm/engine/async_llm_engine.py", line 615, in init
self.engine = self._init_engine(*args, **kwargs)
File "/vllm/vllm/engine/async_llm_engine.py", line 835, in _init_engine
return engine_class(*args, **kwargs)
File "/vllm/vllm/engine/async_llm_engine.py", line 262, in init
super().init(*args, **kwargs)
File "/vllm/vllm/engine/llm_engine.py", line 324, in init
self.model_executor = executor_class(
File "/vllm/vllm/executor/executor_base.py", line 47, in init
self._init_executor()
File "/vllm/vllm/executor/gpu_executor.py", line 38, in _init_executor
self.driver_worker = self._create_worker()
File "/vllm/vllm/executor/gpu_executor.py", line 105, in _create_worker
return create_worker(**self._get_create_worker_kwargs(
File "/vllm/vllm/executor/gpu_executor.py", line 24, in create_worker
wrapper.init_worker(**kwargs)
File "/vllm/vllm/worker/worker_base.py", line 449, in init_worker
self.worker = worker_class(*args, **kwargs)
File "/vllm/vllm/worker/worker.py", line 99, in init
self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
File "/vllm/vllm/worker/model_runner.py", line 888, in init
self.attn_state = self.attn_backend.get_state_cls()(
File "/vllm/vllm/attention/backends/abstract.py", line 43, in get_state_cls
raise NotImplementedError
NotImplementedError
ERROR 09-12 10:08:02 api_server.py:188] RPCServer process died before responding to readiness probe

wangshuai09 · 2024-09-14T09:42:28Z

@beardog6 当前还在开发阶段，这些特性还没调试，欢迎进行合作开发，开发分支为npu_support

wangshuai09 · 2024-09-18T09:04:42Z

@beardog6 这个是你测试的场景吗

# start server
vllm serve facebook/opt-125m

# request
curl http://localhost:8000/v1/completions -H "Content-Type
"model": "facebook/opt-125m",
"prompt": "San Francisco is a",
"max_tokens": 20,
"temperature": 0
}'

# output
{"id":"cmpl-862bb9206aa84004a55c625b75e6dfea","object":"text_completion","created":1726649591,"model":"facebook/opt-125m","choices":[{"index":0,"text":" great place to live.  I've lived in San Francisco for a few years now and I've","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":5,"total_tokens":25,"completion_tokens":20}}

beardog6 · 2024-09-19T03:16:54Z

是的，启动参数有所不同 @wangshuai09

wangshuai09 · 2024-09-19T07:09:04Z

是的，启动参数有所不同 @wangshuai09

我上面的测试通过了，你可以拉取最新的代码，看看你的参数可以跑通吗

beardog6 · 2024-09-20T07:31:31Z

是的，启动参数有所不同 @wangshuai09

我上面的测试通过了，你可以拉取最新的代码，看看你的参数可以跑通吗

Qwen2和Qwen2.5单卡测试通过
Deepseek-v2-lite-chat报错（应该是moe模型的forward_npu方法未实现）：
File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/vllm/vllm/model_executor/models/deepseek_v2.py", line 504, in forward
hidden_states = self.model(input_ids, positions, kv_caches,
File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/vllm/vllm/model_executor/models/deepseek_v2.py", line 461, in forward
hidden_states, residual = layer(positions, hidden_states,
File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/vllm/vllm/model_executor/models/deepseek_v2.py", line 401, in forward
hidden_states = self.mlp(hidden_states)
File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/vllm/vllm/model_executor/models/deepseek_v2.py", line 148, in forward
final_hidden_states = self.experts(
File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/vllm/vllm/model_executor/layers/fused_moe/layer.py", line 442, in forward
final_hidden_states = self.quant_method.apply(
File "/vllm/vllm/model_executor/layers/fused_moe/layer.py", line 78, in apply
return self.forward(x=x,
File "/vllm/vllm/model_executor/custom_op.py", line 14, in forward
return self._forward_method(*args, **kwargs)
File "/vllm/vllm/model_executor/custom_op.py", line 58, in forward_npu
return self.forward_native(*args, **kwargs)
File "/vllm/vllm/model_executor/custom_op.py", line 23, in forward_native
raise NotImplementedError
NotImplementedError
ERROR 09-20 07:51:16 api_server.py:188] RPCServer process died before responding to readiness probe

github-actions · 2025-01-11T02:31:36Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

github-actions · 2025-02-10T02:32:04Z

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

github-actions bot added the stale label Jan 11, 2025

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc]: How to build on Ascend NPU #1

[Misc]: How to build on Ascend NPU #1

wangshuai09 commented Sep 10, 2024 •

edited

Loading

beardog6 commented Sep 13, 2024

wangshuai09 commented Sep 14, 2024

wangshuai09 commented Sep 18, 2024

beardog6 commented Sep 19, 2024

wangshuai09 commented Sep 19, 2024

beardog6 commented Sep 20, 2024 •

edited

Loading

github-actions bot commented Jan 11, 2025

github-actions bot commented Feb 10, 2025

[Misc]: How to build on Ascend NPU #1

[Misc]: How to build on Ascend NPU #1

Comments

wangshuai09 commented Sep 10, 2024 • edited Loading

Wechat Group

Before submitting a new issue...

beardog6 commented Sep 13, 2024

wangshuai09 commented Sep 14, 2024

wangshuai09 commented Sep 18, 2024

beardog6 commented Sep 19, 2024

wangshuai09 commented Sep 19, 2024

beardog6 commented Sep 20, 2024 • edited Loading

github-actions bot commented Jan 11, 2025

github-actions bot commented Feb 10, 2025

wangshuai09 commented Sep 10, 2024 •

edited

Loading

beardog6 commented Sep 20, 2024 •

edited

Loading