You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, recently I was running LLaMA-2 with tensor-parallel inference through generate method and I encounter this problem.
Here is the error msg:
[2023-08-11 23:43:09,855][FK.general_util.evaluator][INFO] - ***** Running evaluation test.test ***** │| 0 N/A N/A 53265 C ...avishankar1/tc/bin/python 16531MiB |
[2023-08-11 23:43:09,855][FK.general_util.evaluator][INFO] - Num examples = 1569 │| 0 N/A N/A 4088900 C ...avishankar1/tc/bin/python 32031MiB |
[2023-08-11 23:43:09,856][FK.general_util.evaluator][INFO] - Batch size = 1 │| 1 N/A N/A 53265 C ...avishankar1/tc/bin/python 331MiB |
Evaluating: 0%| | 0/1569 [00:00<?, ?it/s] │| 1 N/A N/A 1331281 C .../envs/torch2.0/bin/python 14639MiB |
Error executing job with overrides: ['ddp_eval=False'] │| 1 N/A N/A 4099212 C ...avishankar1/tc/bin/python 2087MiB |
Traceback (most recent call last): │| 2 N/A N/A 1331282 C .../envs/torch2.0/bin/python 14667MiB |
File "/export/home2/fangkai/merit-v2/trainer_base_fsdp_v4.py", line 464, in <module> │| 2 N/A N/A 4099212 C ...avishankar1/tc/bin/python 3915MiB |
main() │| 3 N/A N/A 1546448 C ...da3/envs/torch/bin/python 11611MiB |
File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/main.py", line 90, in decorated_main │| 3 N/A N/A 4099212 C ...avishankar1/tc/bin/python 24771MiB |
_run_hydra( │| 4 N/A N/A 1060990 C ...da3/envs/torch/bin/python 18553MiB |
File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/_internal/utils.py", line 389, in _run_hydra │| 5 N/A N/A 2376820 C ...da3/envs/torch/bin/python 42357MiB |
_run_app( │| 5 N/A N/A 4099212 C ...avishankar1/tc/bin/python 349MiB |
File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/_internal/utils.py", line 452, in _run_app │| 6 N/A N/A 4099212 C ...avishankar1/tc/bin/python 32031MiB |
run_and_report( │| 7 N/A N/A 310278 C ...nvs/retrieval/bin/python3 1333MiB |
File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/_internal/utils.py", line 216, in run_and_report │+-----------------------------------------------------------------------------+
raise ex │(base) fangkai@scsehg:~$ nvi8
File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/_internal/utils.py", line 213, in run_and_report │Fri Aug 11 23:03:01 2023
return func() │+-----------------------------------------------------------------------------+
File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/_internal/utils.py", line 453, in <lambda> │| NVIDIA-SMI 510.39.01 Driver Version: 510.39.01 CUDA Version: 11.6 |
lambda: hydra.run( │|-------------------------------+----------------------+----------------------+
File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 132, in run │| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
_ = ret.return_value │| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/core/utils.py", line 260, in return_value │| | | MIG M. |
raise self._return_value │|===============================+======================+======================|
File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/core/utils.py", line 186, in run_job │| 0 NVIDIA RTX A6000 On | 00000000:01:00.0 Off | Off |
ret.return_value = task_function(task_cfg) │| 37% 67C P2 124W / 300W | 48572MiB / 49140MiB | 43% Default |
File "/export/home2/fangkai/merit-v2/trainer_base_fsdp_v4.py", line 436, in main │| | | N/A |
result = evaluate(cfg, model, tokenizer, prefix=prefix, _split=split) │+-------------------------------+----------------------+----------------------+
File "/export/home2/fangkai/merit-v2/general_util/evaluator.py", line 227, in evaluate_fn │| 1 NVIDIA RTX A6000 On | 00000000:24:00.0 Off | Off |
outputs, pred_res = eval_forward_fn(batch) │| 30% 38C P8 22W / 300W | 2438MiB / 49140MiB | 0% Default |
File "/export/home2/fangkai/merit-v2/general_util/evaluator.py", line 470, in __call__ │| | | N/A |
decoding_outputs = self.model.generate(**batch, generation_config=self.generation_config) │+-------------------------------+----------------------+----------------------+
File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context │| 2 NVIDIA RTX A6000 On | 00000000:41:00.0 Off | Off |
return func(*args, **kwargs) │| 30% 41C P8 29W / 300W | 3931MiB / 49140MiB | 0% Default |
File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/transformers/generation/utils.py", line 1588, in generate │| | | N/A |
return self.sample( │+-------------------------------+----------------------+----------------------+
File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/transformers/generation/utils.py", line 2642, in sample │| 3 NVIDIA RTX A6000 On | 00000000:61:00.0 Off | Off |
outputs = self( │| 30% 38C P2 69W / 300W | 36398MiB / 49140MiB | 0% Default |
File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl │| | | N/A |
return forward_call(*args, **kwargs) │+-------------------------------+----------------------+----------------------+
File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 806, in forward │| 4 NVIDIA RTX A6000 On | 00000000:81:00.0 Off | Off |
outputs = self.model( │| 30% 33C P2 73W / 300W | 18555MiB / 49140MiB | 0% Default |
File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl │| | | N/A |
return forward_call(*args, **kwargs) │+-------------------------------+----------------------+----------------------+
File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 693, in forward │| 5 NVIDIA RTX A6000 On | 00000000:A1:00.0 Off | Off |
layer_outputs = decoder_layer( │| 30% 44C P2 99W / 300W | 42718MiB / 49140MiB | 0% Default |
File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl │| | | N/A |
return forward_call(*args, **kwargs) │+-------------------------------+----------------------+----------------------+
File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward │| 6 NVIDIA RTX A6000 On | 00000000:C1:00.0 Off | Off |
hidden_states, self_attn_weights, present_key_value = self.self_attn( │| 36% 66C P2 188W / 300W | 32047MiB / 49140MiB | 100% Default |
File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl │| | | N/A |
return forward_call(*args, **kwargs) │+-------------------------------+----------------------+----------------------+
File "/export/home2/fangkai/merit-v2/models/llama.py", line 75, in _forward │| 7 NVIDIA RTX A6000 On | 00000000:E1:00.0 Off | Off |
query_states = self.q_proj(hidden_states) │| 30% 38C P2 73W / 300W | 1335MiB / 49140MiB | 13% Default |
File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl │| | | N/A |
return forward_call(*args, **kwargs) │+-------------------------------+----------------------+----------------------+
File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward │
return F.linear(input, self.weight, self.bias) │+-----------------------------------------------------------------------------+
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm) │| Processes: |
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1346893) of binary: /export/home2/fangkai/anaconda3/envs/torch2.0/bin/python │| GPU GI CI PID Type Process name
It seems that the error happened at query projection.
The following is my initialization wrap:
def load_model_from_pretrained_tp(pretrained_model_name_or_path: str, *args, **kwargs):
tp_sharded = kwargs.pop("tp_sharded", None)
enable_flash_attention = kwargs.pop("enable_flash_attention", False)
flash_attention_vanilla_torch = kwargs.pop("flash_attention_vanilla_torch", False)
flash_attention_var_len = kwargs.pop("flash_attention_var_len", False)
model = LlamaForCausalLM.from_pretrained(pretrained_model_name_or_path, *args, **kwargs)
if enable_flash_attention:
logger.info("⚡⚡⚡ enable llama flash attention.")
layers = model.model.layers
for layer in layers:
llama_fast_attention_wrap(layer.self_attn, vanilla_torch=flash_attention_vanilla_torch, var_len=flash_attention_var_len)
import tensor_parallel as tp
import torch.distributed as dist
n_gpus = torch.cuda.device_count()
if not dist.is_initialized():
model = tp.tensor_parallel(model, [torch.device(f"cuda:{i}") for i in range(n_gpus)], sharded=tp_sharded)
else:
model = tp.tensor_parallel(model, sharded=False)[0]
return model
I noticed that you do not calling batch["input_ids"].to(device) method. When I remove this code I found that it will raise another error message that the inputs are on cpu.
Hi, recently I was running LLaMA-2 with tensor-parallel inference through
generate
method and I encounter this problem.Here is the error msg:
It seems that the error happened at query projection.
The following is my initialization wrap:
I noticed that you do not calling
batch["input_ids"].to(device)
method. When I remove this code I found that it will raise another error message that the inputs are on cpu.version information:
Thanks for your help very much!
The text was updated successfully, but these errors were encountered: