RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #116

SparkJiao · 2023-08-11T15:50:54Z

Hi, recently I was running LLaMA-2 with tensor-parallel inference through generate method and I encounter this problem.

Here is the error msg:

[2023-08-11 23:43:09,855][FK.general_util.evaluator][INFO] - ***** Running evaluation test.test *****                                                                                                   │|    0   N/A  N/A     53265      C   ...avishankar1/tc/bin/python    16531MiB |
[2023-08-11 23:43:09,855][FK.general_util.evaluator][INFO] -   Num examples = 1569                                                                                                                      │|    0   N/A  N/A   4088900      C   ...avishankar1/tc/bin/python    32031MiB |
[2023-08-11 23:43:09,856][FK.general_util.evaluator][INFO] -   Batch size = 1                                                                                                                           │|    1   N/A  N/A     53265      C   ...avishankar1/tc/bin/python      331MiB |
Evaluating:   0%|          | 0/1569 [00:00<?, ?it/s]                                                                                                                                                    │|    1   N/A  N/A   1331281      C   .../envs/torch2.0/bin/python    14639MiB |
Error executing job with overrides: ['ddp_eval=False']                                                                                                                                                  │|    1   N/A  N/A   4099212      C   ...avishankar1/tc/bin/python     2087MiB |
Traceback (most recent call last):                                                                                                                                                                      │|    2   N/A  N/A   1331282      C   .../envs/torch2.0/bin/python    14667MiB |
  File "/export/home2/fangkai/merit-v2/trainer_base_fsdp_v4.py", line 464, in <module>                                                                                                                  │|    2   N/A  N/A   4099212      C   ...avishankar1/tc/bin/python     3915MiB |
    main()                                                                                                                                                                                              │|    3   N/A  N/A   1546448      C   ...da3/envs/torch/bin/python    11611MiB |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/main.py", line 90, in decorated_main                                                                            │|    3   N/A  N/A   4099212      C   ...avishankar1/tc/bin/python    24771MiB |
    _run_hydra(                                                                                                                                                                                         │|    4   N/A  N/A   1060990      C   ...da3/envs/torch/bin/python    18553MiB |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/_internal/utils.py", line 389, in _run_hydra                                                                    │|    5   N/A  N/A   2376820      C   ...da3/envs/torch/bin/python    42357MiB |
    _run_app(                                                                                                                                                                                           │|    5   N/A  N/A   4099212      C   ...avishankar1/tc/bin/python      349MiB |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/_internal/utils.py", line 452, in _run_app                                                                      │|    6   N/A  N/A   4099212      C   ...avishankar1/tc/bin/python    32031MiB |
    run_and_report(                                                                                                                                                                                     │|    7   N/A  N/A    310278      C   ...nvs/retrieval/bin/python3     1333MiB |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/_internal/utils.py", line 216, in run_and_report                                                                │+-----------------------------------------------------------------------------+
    raise ex                                                                                                                                                                                            │(base) fangkai@scsehg:~$ nvi8
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/_internal/utils.py", line 213, in run_and_report                                                                │Fri Aug 11 23:03:01 2023
    return func()                                                                                                                                                                                       │+-----------------------------------------------------------------------------+
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/_internal/utils.py", line 453, in <lambda>                                                                      │| NVIDIA-SMI 510.39.01    Driver Version: 510.39.01    CUDA Version: 11.6     |
    lambda: hydra.run(                                                                                                                                                                                  │|-------------------------------+----------------------+----------------------+
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 132, in run                                                                           │| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    _ = ret.return_value                                                                                                                                                                                │| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/core/utils.py", line 260, in return_value                                                                       │|                               |                      |               MIG M. |
    raise self._return_value                                                                                                                                                                            │|===============================+======================+======================|
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/core/utils.py", line 186, in run_job                                                                            │|   0  NVIDIA RTX A6000    On   | 00000000:01:00.0 Off |                  Off |
    ret.return_value = task_function(task_cfg)                                                                                                                                                          │| 37%   67C    P2   124W / 300W |  48572MiB / 49140MiB |     43%      Default |
  File "/export/home2/fangkai/merit-v2/trainer_base_fsdp_v4.py", line 436, in main                                                                                                                      │|                               |                      |                  N/A |
    result = evaluate(cfg, model, tokenizer, prefix=prefix, _split=split)                                                                                                                               │+-------------------------------+----------------------+----------------------+
  File "/export/home2/fangkai/merit-v2/general_util/evaluator.py", line 227, in evaluate_fn                                                                                                             │|   1  NVIDIA RTX A6000    On   | 00000000:24:00.0 Off |                  Off |
    outputs, pred_res = eval_forward_fn(batch)                                                                                                                                                          │| 30%   38C    P8    22W / 300W |   2438MiB / 49140MiB |      0%      Default |
  File "/export/home2/fangkai/merit-v2/general_util/evaluator.py", line 470, in __call__                                                                                                                │|                               |                      |                  N/A |
    decoding_outputs = self.model.generate(**batch, generation_config=self.generation_config)                                                                                                           │+-------------------------------+----------------------+----------------------+
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context                                                            │|   2  NVIDIA RTX A6000    On   | 00000000:41:00.0 Off |                  Off |
    return func(*args, **kwargs)                                                                                                                                                                        │| 30%   41C    P8    29W / 300W |   3931MiB / 49140MiB |      0%      Default |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/transformers/generation/utils.py", line 1588, in generate                                                             │|                               |                      |                  N/A |
    return self.sample(                                                                                                                                                                                 │+-------------------------------+----------------------+----------------------+
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/transformers/generation/utils.py", line 2642, in sample                                                               │|   3  NVIDIA RTX A6000    On   | 00000000:61:00.0 Off |                  Off |
    outputs = self(                                                                                                                                                                                     │| 30%   38C    P2    69W / 300W |  36398MiB / 49140MiB |      0%      Default |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl                                                                 │|                               |                      |                  N/A |
    return forward_call(*args, **kwargs)                                                                                                                                                                │+-------------------------------+----------------------+----------------------+
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 806, in forward                                                    │|   4  NVIDIA RTX A6000    On   | 00000000:81:00.0 Off |                  Off |
    outputs = self.model(                                                                                                                                                                               │| 30%   33C    P2    73W / 300W |  18555MiB / 49140MiB |      0%      Default |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl                                                                 │|                               |                      |                  N/A |
    return forward_call(*args, **kwargs)                                                                                                                                                                │+-------------------------------+----------------------+----------------------+
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 693, in forward                                                    │|   5  NVIDIA RTX A6000    On   | 00000000:A1:00.0 Off |                  Off |
    layer_outputs = decoder_layer(                                                                                                                                                                      │| 30%   44C    P2    99W / 300W |  42718MiB / 49140MiB |      0%      Default |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl                                                                 │|                               |                      |                  N/A |
    return forward_call(*args, **kwargs)                                                                                                                                                                │+-------------------------------+----------------------+----------------------+
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward                                                    │|   6  NVIDIA RTX A6000    On   | 00000000:C1:00.0 Off |                  Off |
    hidden_states, self_attn_weights, present_key_value = self.self_attn(                                                                                                                               │| 36%   66C    P2   188W / 300W |  32047MiB / 49140MiB |    100%      Default |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl                                                                 │|                               |                      |                  N/A |
    return forward_call(*args, **kwargs)                                                                                                                                                                │+-------------------------------+----------------------+----------------------+
  File "/export/home2/fangkai/merit-v2/models/llama.py", line 75, in _forward                                                                                                                           │|   7  NVIDIA RTX A6000    On   | 00000000:E1:00.0 Off |                  Off |
    query_states = self.q_proj(hidden_states)                                                                                                                                                           │| 30%   38C    P2    73W / 300W |   1335MiB / 49140MiB |     13%      Default |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl                                                                 │|                               |                      |                  N/A |
    return forward_call(*args, **kwargs)                                                                                                                                                                │+-------------------------------+----------------------+----------------------+
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward                                                                     │
    return F.linear(input, self.weight, self.bias)                                                                                                                                                      │+-----------------------------------------------------------------------------+
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)                       │| Processes:                                                                  |
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1346893) of binary: /export/home2/fangkai/anaconda3/envs/torch2.0/bin/python                               │|  GPU   GI   CI        PID   Type   Process name

It seems that the error happened at query projection.

The following is my initialization wrap:

def load_model_from_pretrained_tp(pretrained_model_name_or_path: str, *args, **kwargs):
    tp_sharded = kwargs.pop("tp_sharded", None)
    enable_flash_attention = kwargs.pop("enable_flash_attention", False)
    flash_attention_vanilla_torch = kwargs.pop("flash_attention_vanilla_torch", False)
    flash_attention_var_len = kwargs.pop("flash_attention_var_len", False)

    model = LlamaForCausalLM.from_pretrained(pretrained_model_name_or_path, *args, **kwargs)

    if enable_flash_attention:
        logger.info("⚡⚡⚡ enable llama flash attention.")

        layers = model.model.layers
        for layer in layers:
            llama_fast_attention_wrap(layer.self_attn, vanilla_torch=flash_attention_vanilla_torch, var_len=flash_attention_var_len)

    import tensor_parallel as tp
    import torch.distributed as dist

    n_gpus = torch.cuda.device_count()
    if not dist.is_initialized():
        model = tp.tensor_parallel(model, [torch.device(f"cuda:{i}") for i in range(n_gpus)], sharded=tp_sharded)
    else:
        model = tp.tensor_parallel(model, sharded=False)[0]
    return model

I noticed that you do not calling batch["input_ids"].to(device) method. When I remove this code I found that it will raise another error message that the inputs are on cpu.

version information:

transformers==4.31.0
torch==2.0.0
tensor-parallel==2.0.0

Thanks for your help very much!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #116

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #116

SparkJiao commented Aug 11, 2023

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #116

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #116

Comments

SparkJiao commented Aug 11, 2023