You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Models >lmdeploy chat .\deepseek-r1-distill-qwen-7b-gptq-int4-turbomind\ --model-format gptq
Add dll path C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin, please note cuda version should >= 11.3 when compiled with cuda 11
2025-01-31 17:11:14,567 - lmdeploy - WARNING - supported_models.py:106 - .\deepseek-r1-distill-qwen-7b-gptq-int4-turbomind\ seems to be a turbomind workspace, which can only be ran with turbomind engine.
chat_template_config:
ChatTemplateConfig(model_name='deepseek-r1', system=None, meta_instruction=None, eosys=None, user=None, eoh=None, assistant=None, eoa=None, tool=None, eotool=None, separator=None, capability='chat', stop_words=None)
engine_cfg:
TurbomindEngineConfig(dtype='auto', model_format='gptq', tp=1, session_len=131072, max_batch_size=1, cache_max_entry_count=0.8, cache_chunk_size=-1, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1)
[WARNING] gemm_config.in is not found; using default GEMM algo
double enter to end input >>> 你好
<|begin▁of▁sentence|><|User|>你好<|Assistant|><think></think>
你好!很高兴见到你,有什么我可以帮忙的吗?
Environment
sys.platform: win32
Python: 3.10.16 | packaged by Anaconda, Inc. | (main, Dec 11 2024, 16:19:12) [MSC v.1929 64 bit (AMD64)]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA GeForce RTX 4060 Laptop GPU
CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8
NVCC: Cuda compilation tools, release 12.8, V12.8.61
MSVC: 用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.42.34436 版
GCC: n/a
PyTorch: 2.6.0+cu126
PyTorch compiling details: PyTorch built with:
- C++ Version: 201703
- MSVC 192930157
- Intel(R) oneAPI Math Kernel Library Version 2025.0.1-Product Build 20241031 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v3.5.3 (Git Hash 66f0cb9eb66affd2da3bf5f8d897376f04aae6af)
- OpenMP 2019
- LAPACK is enabled (usually provided by MKL)
- CPU capability usage: AVX2
- CUDA Runtime 12.6
- NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
- CuDNN 90.5.1
- Magma 2.5.4
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=2236df1770800ffea5697b11b0bb0d910b2e59e1, CUDA_VERSION=12.6, CUDNN_VERSION=9.5.1, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/pytorch/.ci/pytorch/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /Zc:__cplusplus /bigobj /FS /utf-8 -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.6.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,
TorchVision: 0.21.0+cu126
LMDeploy: 0.7.0.post2+
transformers: 4.48.2
gradio: Not Found
fastapi: 0.115.8
pydantic: 2.10.6
triton: Not Found
Checklist
Describe the bug
在使用pipeline加载deepseek-r1-distill-qwen-7b-gptq-int4模型时卡住,但是在命令行部署时正常。
我在标题里写“挂起”,因为它真的挂起了:
Reproduction
这是问题代码
而命令行部署正常:
Environment
Error traceback
The text was updated successfully, but these errors were encountered: