Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

不管使用A6000*4还是A800*4推理2B, 7B还是72B模型,选择device_map=auto都会出现各种奇怪的报错,只有device_map="cuda:x"单卡才能正常使用 #546

Open
luosting opened this issue Nov 19, 2024 · 0 comments

Comments

@luosting
Copy link

luosting commented Nov 19, 2024

我的环境库为

av 9.2.0 pypi_0 pypi
babel 2.16.0 pypi_0 pypi
backoff 1.11.1 pypi_0 pypi
beautifulsoup4 4.12.3 pypi_0 pypi
bibtexparser 2.0.0b7 pypi_0 pypi
binaryornot 0.4.4 pypi_0 pypi
blinker 1.9.0 pypi_0 pypi
bzip2 1.0.8 h5eee18b_6 defaults
ca-certificates 2024.9.24 h06a4308_0 defaults
cachetools 5.5.0 pypi_0 pypi
certifi 2024.8.30 pypi_0 pypi
cffi 1.17.1 pypi_0 pypi
chardet 5.2.0 pypi_0 pypi
charset-normalizer 3.4.0 pypi_0 pypi
chex 0.1.82 pypi_0 pypi
click 8.1.7 pypi_0 pypi
clldutils 3.24.0 pypi_0 pypi
cloudpickle 3.1.0 pypi_0 pypi
codecarbon 1.2.0 pypi_0 pypi
colorama 0.4.6 pypi_0 pypi
colorlog 6.9.0 pypi_0 pypi
conda-pack 0.8.0 pypi_0 pypi
contourpy 1.3.1 pypi_0 pypi
cookiecutter 1.7.3 pypi_0 pypi
csvw 3.5.1 pypi_0 pypi
cycler 0.12.1 pypi_0 pypi
dash 2.18.2 pypi_0 pypi
dash-bootstrap-components 1.6.0 pypi_0 pypi
dash-core-components 2.0.0 pypi_0 pypi
dash-html-components 2.0.0 pypi_0 pypi
dash-table 5.0.0 pypi_0 pypi
datasets 3.1.0 pypi_0 pypi
decorator 5.1.1 pypi_0 pypi
decord 0.6.0 pypi_0 pypi
dill 0.3.4 pypi_0 pypi
diskcache 5.6.3 pypi_0 pypi
distro 1.9.0 pypi_0 pypi
dlinfo 1.2.1 pypi_0 pypi
dm-tree 0.1.8 pypi_0 pypi
einops 0.8.0 pypi_0 pypi
etils 1.10.0 pypi_0 pypi
evaluate 0.4.3 pypi_0
google-auth-oauthlib 1.2.1 pypi_0 pypi
google-pasta 0.2.0 pypi_0 pypi
gradio 4.43.0 pypi_0 pypi
gradio-client 1.3.0 pypi_0 pypi
greenlet 3.1.1 pypi_0 pypi
grpcio 1.67.1 pypi_0 pypi
h11 0.14.0 pypi_0 pypi
h5py 3.12.1 pypi_0 pypi
httpcore 1.0.6 pypi_0 pypi
httptools 0.6.4 pypi_0 pypi
httpx 0.27.2 pypi_0 pypi
huggingface-hub 0.26.2 pypi_0 pypi
hypothesis 6.118.8 pypi_0 pypi
idna 3.10 pypi_0 pypi
importlib-metadata 8.5.0 pypi_0 pypi
importlib-resources 6.4.5 pypi_0 pypi
interegular 0.3.3 pypi_0 pypi
ipadic 1.0.0 pypi_0 pypi
isodate 0.7.2 pypi_0 pypi
isort 5.13.2 pypi_0 pypi
itsdangerous 2.2.0 pypi_0 pypi
jax 0.4.13 pypi_0 pypi
jaxlib 0.4.13 pypi_0 pypi
jinja2 3.1.4 pypi_0 pypi
jinja2-time 0.2.0 pypi_0 pypi
jiter 0.7.1 pypi_0 pypi
joblib 1.4.2 pypi_0 pypi
jsonschema 4.23.0 pypi_0 pypi
jsonschema-specifications 2024.10.1 pypi_0 pypi
kagglehub 0.3.4 pypi_0 pypi
kenlm 0.2.0 pypi_0 pypi
keras 2.15.0 pypi_0 pypi
keras-core 0.1.7 pypi_0 pypi
keras-nlp 0.12.1 pypi_0 pypi
kiwisolver 1.4.7 pypi_0 pypi
language-tags 1.2.0 pypi_0 pypi
lark 1.2.2 pypi_0 pypi
lazy-loader 0.4 pypi_0 pypi
ld_impl_linux-64 2.40 h12ee557_0 defaults
libclang 18.1.1 pypi_0 pypi
libcst 1.5.0 pypi_0 pypi
libffi 3.4.4 h6a678d5_1
multidict 6.1.0 pypi_0 pypi
multiprocess 0.70.12.2 pypi_0 pypi
namex 0.0.8 pypi_0 pypi
ncurses 6.4 h6a678d5_0 defaults
nest-asyncio 1.6.0 pypi_0 pypi
networkx 3.4.2 pypi_0 pypi
ninja 1.11.1.1 pypi_0 pypi
nltk 3.8.1 pypi_0 pypi
numba 0.60.0 pypi_0 pypi
numpy 1.26.4 pypi_0 pypi
nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi
nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi
nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi
nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi
nvidia-curand-cu12 10.3.2.106 pypi_0 pypi
nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi
nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi
nvidia-ml-py 12.560.30 pypi_0 pypi
nvidia-nccl-cu12 2.20.5 pypi_0 pypi
nvidia-nvjitlink-cu12 12.4.127 pypi_0 pypi
nvidia-nvtx-cu12 12.1.105 pypi_0 pypi
oauthlib 3.2.2 pypi_0 pypi
onnx 1.17.0 pypi_0 pypi
onnxconverter-common 1.13.0 pypi_0 pypi
openai 1.54.4 pypi_0 pypi
openssl 3.0.15 h5eee18b_0 defaults
opt-einsum 3.4.0 pypi_0 pypi
optax 0.1.4 pypi_0 pypi
optuna 4.1.0 pypi_0 pypi
orbax-checkpoint 0.5.16 pypi_0 pypi
orjson 3.10.11 pypi_0 pypi
outlines 0.0.46 pypi_0 pypi
packaging 24.2 pypi_0 pypi
pandas 2.2.3 pypi_0 pypi
parameterized 0.9.0 pypi_0 pypi
partial-json-parser 0.2.1.1.post4 pypi_0 pypi
phonemizer 3.3.0 pypi_0 pypi
pillow 10.4.0 pypi_0 pypi
pip 24.2 py310h06a4308_0 defaults
plac 1.4.3
pydantic-core 2.23.4 pypi_0 pypi
pydub 0.25.1 pypi_0 pypi
pygments 2.18.0 pypi_0 pypi
pygtrie 2.5.0 pypi_0 pypi
pylatexenc 2.10 pypi_0 pypi
pynvml 11.5.3 pypi_0 pypi
pyparsing 3.2.0 pypi_0 pypi
pypng 0.20220715.0 pypi_0 pypi
pytest 7.4.4 pypi_0 pypi
pytest-rich 0.1.1 pypi_0 pypi
pytest-timeout 2.3.1 pypi_0 pypi
pytest-xdist 3.6.1 pypi_0 pypi
python 3.10.11 h955ad1f_3 defaults
python-dateutil 2.9.0.post0 pypi_0 pypi
python-dotenv 1.0.1 pypi_0 pypi
python-multipart 0.0.17 pypi_0 pypi
python-slugify 8.0.4 pypi_0 pypi
pytz 2024.2 pypi_0 pypi
pyyaml 6.0.2 pypi_0 pypi
pyzmq 26.2.0 pypi_0 pypi
qwen-vl-utils 0.0.8 pypi_0 pypi
ray 2.39.0 pypi_0 pypi
rdflib 7.1.1 pypi_0 pypi
readline 8.2 h5eee18b_0 defaults
referencing 0.35.1 pypi_0 pypi
regex 2024.11.6 pypi_0 pypi
requests 2.32.3 pypi_0 pypi
requests-oauthlib 2.0.0 pypi_0 pypi
retrying 1.3.4 pypi_0 pypi
rfc3986 1.5.0 pypi_0 pypi
rhoknp 1.3.0 pypi_0 pypi
rich 13.9.4 pypi_0 pypi
rjieba 0.1.11 pypi_0 pypi
rouge-score 0.1.2 pypi_0 pypi
rpds-py 0.21.0 pypi_0 pypi
rsa 4.9 pypi_0 pypi
ruff 0.5.1 pypi_0 pypi
sacrebleu 1.5.1 pypi_0 pypi
sacremoses 0.1.1 pypi_0 pypi
safetensors 0.4.5 pypi_0 pypi
scikit-learn 1.5.2 pypi_0 pypi
scipy 1.1
sympy 1.13.1 pypi_0 pypi
tabulate 0.9.0 pypi_0 pypi
tenacity 9.0.0 pypi_0 pypi
tensorboard 2.15.2 pypi_0 pypi
tensorboard-data-server 0.7.2 pypi_0 pypi
tensorboardx 2.6.2.2 pypi_0 pypi
tensorflow 2.15.1 pypi_0 pypi
tensorflow-estimator 2.15.0 pypi_0 pypi
tensorflow-hub 0.16.1 pypi_0 pypi
tensorflow-io-gcs-filesystem 0.37.1 pypi_0 pypi
tensorflow-text 2.15.0 pypi_0 pypi
tensorrt-cu12 10.6.0 pypi_0 pypi
tensorrt-cu12-bindings 10.6.0 pypi_0 pypi
tensorrt-cu12-libs 10.6.0 pypi_0 pypi
tensorstore 0.1.67 pypi_0 pypi
termcolor 2.5.0 pypi_0 pypi
text-unidecode 1.3 pypi_0 pypi
tf-keras 2.15.1 pypi_0 pypi
tf2onnx 1.16.1 pypi_0 pypi
threadpoolctl 3.5.0 pypi_0 pypi
tiktoken 0.7.0 pypi_0 pypi
timeout-decorator 0.5.0 pypi_0 pypi
timm 0.9.16 pypi_0 pypi
tk 8.6.14 h39e8969_0 defaults
tokenizers 0.20.3 pypi_0 pypi
tomli 2.1.0 pypi_0 pypi
tomlkit 0.12.0 pypi_0 pypi
toolz 1.0.0 pypi_0 pypi
torch 2.4.0 pypi_0 pypi
torchvision 0.19.0 pypi_0 pypi
tqdm 4.67.0 pypi_0 pypi
transformers 4.46.0 pypi_0 pypi
transformers-stream-generator 0.0.4 pypi_0 pypi
triton 3.0.0 pypi_0 pypi
typer 0.13.0 pypi_0 pypi
types-python-dateutil 2.9.0.20241003 pypi_0 pypi
typing-extensions 4.12.2 pypi_0 pypi
tzdata 2024.2 pypi_0 pypi
tzlocal 5.2 pypi_0 pypi
unidic 1.1.0 pypi_0 pypi
unidic-lite 1.0.8 pypi_0 pypi
uritemplate 4.1.1 pypi_0 pypi
urllib3 2.2.3 pypi_0 pypi
uvicorn 0.32.0 pypi_0 pypi
uvloop 0.21.0 pypi_0 pypi
vllm 0.6.1 pypi_0 pypi
vllm-flash-attn 2.6.1 pypi_0 pypi
wasabi 0.10.1 pypi_0 pypi
watchfiles 0.24.0 pypi_0 pypi
websockets 12.0 pypi_0 pypi
werkzeug 3.0.6 pypi_0 pypi
wheel 0.44.0 py310h06a4308_0 defaults
wrapt 1.14.1 pypi_0 pypi
xformers 0.0.27.post2 pypi_0 pypi
xxhash 3.5.0 pypi_0 pypi
xz 5.4.6 h5eee18b_1 defaults
yarl 1.17.1 pypi_0 pypi
zipp 3.21.0 pypi_0 pypi
zlib 1.2.13 h5eee18b_1 defaults

测试代码如下

from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch
import time
import os

start_time = time.time()
model_path = "/home/ps/Qwen2_VL/Qwen/Qwen2-VL-7B-Instruct"

DEVICE_BASE = 0

_device_map = {'visual': DEVICE_BASE+1, 'model.embed_tokens': DEVICE_BASE+1, 'model.layers.0': DEVICE_BASE, 'model.layers.1': DEVICE_BASE, 'model.layers.2': DEVICE_BASE, 'model.layers.3': DEVICE_BASE, 'model.layers.4': DEVICE_BASE, 'model.layers.5': DEVICE_BASE, 'model.layers.6': DEVICE_BASE, 'model.layers.7': DEVICE_BASE, 'model.layers.8': DEVICE_BASE, 'model.layers.9': DEVICE_BASE, 'model.layers.10': DEVICE_BASE, 'model.layers.11': DEVICE_BASE, 'model.layers.12': DEVICE_BASE, 'model.layers.13': DEVICE_BASE, 'model.layers.14': DEVICE_BASE, 'model.layers.15': DEVICE_BASE, 'model.layers.16': DEVICE_BASE, 'model.layers.17': DEVICE_BASE, 'model.layers.18': DEVICE_BASE, 'model.layers.19': DEVICE_BASE, 'model.layers.20': DEVICE_BASE, 'model.layers.21': DEVICE_BASE, 'model.layers.22': DEVICE_BASE, 'model.layers.23': DEVICE_BASE, 'model.layers.24': DEVICE_BASE, 'model.layers.25': DEVICE_BASE, 'model.layers.26': DEVICE_BASE, 'model.layers.27': DEVICE_BASE, 'model.norm' : DEVICE_BASE, 'model.rotary_emb': DEVICE_BASE, 'lm_head': DEVICE_BASE}

os.environ["CUDA_VISIBLE_DEVICES"] = '0,1,2,3,4'



model = Qwen2VLForConditionalGeneration.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    device_map="cuda:0",
)


processor = AutoProcessor.from_pretrained(model_path)



messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "video",
                "video": "/home/ps/Qwen2-VL-main/8点20-8点24.mp4",
            },
            {"type": "text", "text": "描述一下视频内容"},
        ],

    }
]


text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)


image_inputs, video_inputs = process_vision_info(messages)


inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)


inputs = inputs.to("cuda")




generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]


output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)


print(output_text)


end_time = time.time()


execution_time = end_time - start_time


print(f"脚本执行时间: {execution_time:.2f} 秒")

报错内容为:
2024-11-19 18:00:30.849819: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-19 18:00:30.849866: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-19 18:00:30.851539: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-19 18:00:30.859877: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-11-19 18:00:31.665709: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Qwen2VLRotaryEmbedding can now be fully parameterized by passing the model config through the config argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|██████████| 5/5 [00:08<00:00, 1.64s/it]
qwen-vl-utils using decord to read video.
Traceback (most recent call last):
File "/home/ps/Qwen2-VL-main/test.py", line 85, in
generated_ids = model.generate(**inputs, max_new_tokens=128)
File "/home/ps/anaconda3/envs/Qwen2_VL2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/ps/anaconda3/envs/Qwen2_VL2/lib/python3.10/site-packages/transformers/generation/utils.py", line 2215, in generate
result = self._sample(
File "/home/ps/anaconda3/envs/Qwen2_VL2/lib/python3.10/site-packages/transformers/generation/utils.py", line 3206, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/home/ps/anaconda3/envs/Qwen2_VL2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ps/anaconda3/envs/Qwen2_VL2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ps/anaconda3/envs/Qwen2_VL2/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/ps/anaconda3/envs/Qwen2_VL2/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1722, in forward
outputs = self.model(
File "/home/ps/anaconda3/envs/Qwen2_VL2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ps/anaconda3/envs/Qwen2_VL2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ps/anaconda3/envs/Qwen2_VL2/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1159, in forward
layer_outputs = decoder_layer(
File "/home/ps/anaconda3/envs/Qwen2_VL2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ps/anaconda3/envs/Qwen2_VL2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ps/anaconda3/envs/Qwen2_VL2/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/ps/anaconda3/envs/Qwen2_VL2/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 906, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/ps/anaconda3/envs/Qwen2_VL2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ps/anaconda3/envs/Qwen2_VL2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ps/anaconda3/envs/Qwen2_VL2/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/ps/anaconda3/envs/Qwen2_VL2/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 721, in forward
attn_output = _flash_attention_forward(
File "/home/ps/anaconda3/envs/Qwen2_VL2/lib/python3.10/site-packages/transformers/modeling_flash_attention_utils.py", line 246, in _flash_attention_forward
query_states, key_states, value_states, indices_q, cu_seq_lens, max_seq_lens = _upad_input(
File "/home/ps/anaconda3/envs/Qwen2_VL2/lib/python3.10/site-packages/transformers/modeling_flash_attention_utils.py", line 99, in _upad_input
indices_k, cu_seqlens_k, max_seqlen_in_batch_k = _get_unpad_data(attention_mask)
File "/home/ps/anaconda3/envs/Qwen2_VL2/lib/python3.10/site-packages/transformers/modeling_flash_attention_utils.py", line 50, in _get_unpad_data
indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant