Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead. #1

Open
James-Lu-none opened this issue Nov 2, 2024 · 0 comments

Comments

@James-Lu-none
Copy link
Contributor

James-Lu-none commented Nov 2, 2024

cd /root/workspace/github/optimum-habana/examples/text-generation/
python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_generation.py --model_name_or_path /root/workspace/model/meta-llama/Llama-3.1-8B/ --batch_size 1 --use_hpu_graphs --use_kv_cache --max_new_tokens 100
DistributedRunner run(): command = deepspeed --num_nodes 1 --num_gpus 8 --no_local_rank --master_port 29500 run_generation.py --model_name_or_path /root/workspace/model/meta-llama/Llama-3.1-8B/ --batch_size 1 --use_hpu_graphs --use_kv_cache --max_new_tokens 100
/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py:366: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/__init__.py:160: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
  warnings.warn(
[2024-11-02 09:56:10,744] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to hpu (auto detect)
[2024-11-02 09:56:12,446] [WARNING] [runner.py:204:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-11-02 09:56:12,514] [INFO] [runner.py:580:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=29500 --no_local_rank --enable_each_rank_log=None run_generation.py --model_name_or_path /root/workspace/model/meta-llama/Llama-3.1-8B/ --batch_size 1 --use_hpu_graphs --use_kv_cache --max_new_tokens 100
/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py:366: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/__init__.py:160: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
  warnings.warn(
[2024-11-02 09:56:14,198] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to hpu (auto detect)
[2024-11-02 09:56:15,831] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
[2024-11-02 09:56:15,831] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=8, node_rank=0
[2024-11-02 09:56:15,831] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
[2024-11-02 09:56:15,831] [INFO] [launch.py:163:main] dist_world_size=8
[2024-11-02 09:56:15,831] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
[2024-11-02 09:56:15,832] [INFO] [launch.py:253:main] process 14322 spawned with command: ['/usr/bin/python3', '-u', 'run_generation.py', '--model_name_or_path', '/root/workspace/model/meta-llama/Llama-3.1-8B/', '--batch_size', '1', '--use_hpu_graphs', '--use_kv_cache', '--max_new_tokens', '100']
[2024-11-02 09:56:15,832] [INFO] [launch.py:253:main] process 14323 spawned with command: ['/usr/bin/python3', '-u', 'run_generation.py', '--model_name_or_path', '/root/workspace/model/meta-llama/Llama-3.1-8B/', '--batch_size', '1', '--use_hpu_graphs', '--use_kv_cache', '--max_new_tokens', '100']
[2024-11-02 09:56:15,833] [INFO] [launch.py:253:main] process 14324 spawned with command: ['/usr/bin/python3', '-u', 'run_generation.py', '--model_name_or_path', '/root/workspace/model/meta-llama/Llama-3.1-8B/', '--batch_size', '1', '--use_hpu_graphs', '--use_kv_cache', '--max_new_tokens', '100']
[2024-11-02 09:56:15,833] [INFO] [launch.py:253:main] process 14325 spawned with command: ['/usr/bin/python3', '-u', 'run_generation.py', '--model_name_or_path', '/root/workspace/model/meta-llama/Llama-3.1-8B/', '--batch_size', '1', '--use_hpu_graphs', '--use_kv_cache', '--max_new_tokens', '100']
[2024-11-02 09:56:15,834] [INFO] [launch.py:253:main] process 14326 spawned with command: ['/usr/bin/python3', '-u', 'run_generation.py', '--model_name_or_path', '/root/workspace/model/meta-llama/Llama-3.1-8B/', '--batch_size', '1', '--use_hpu_graphs', '--use_kv_cache', '--max_new_tokens', '100']
[2024-11-02 09:56:15,834] [INFO] [launch.py:253:main] process 14327 spawned with command: ['/usr/bin/python3', '-u', 'run_generation.py', '--model_name_or_path', '/root/workspace/model/meta-llama/Llama-3.1-8B/', '--batch_size', '1', '--use_hpu_graphs', '--use_kv_cache', '--max_new_tokens', '100']
[2024-11-02 09:56:15,835] [INFO] [launch.py:253:main] process 14328 spawned with command: ['/usr/bin/python3', '-u', 'run_generation.py', '--model_name_or_path', '/root/workspace/model/meta-llama/Llama-3.1-8B/', '--batch_size', '1', '--use_hpu_graphs', '--use_kv_cache', '--max_new_tokens', '100']
[2024-11-02 09:56:15,835] [INFO] [launch.py:253:main] process 14329 spawned with command: ['/usr/bin/python3', '-u', 'run_generation.py', '--model_name_or_path', '/root/workspace/model/meta-llama/Llama-3.1-8B/', '--batch_size', '1', '--use_hpu_graphs', '--use_kv_cache', '--max_new_tokens', '100']
/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py:366: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/__init__.py:160: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
  warnings.warn(
[2024-11-02 09:56:31,203] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to hpu (auto detect)
[2024-11-02 09:56:33,053] [INFO] [comm.py:637:init_distributed] cdb=None
/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py:366: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/__init__.py:160: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
  warnings.warn(
[2024-11-02 09:56:35,510] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to hpu (auto detect)
/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py:366: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn(
[2024-11-02 09:56:36,676] [INFO] [comm.py:637:init_distributed] cdb=None
/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py:366: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py:366: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/__init__.py:160: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
  warnings.warn(
[2024-11-02 09:56:37,204] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to hpu (auto detect)
/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py:366: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/__init__.py:160: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
  warnings.warn(
[2024-11-02 09:56:37,299] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to hpu (auto detect)
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/__init__.py:160: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
  warnings.warn(
[2024-11-02 09:56:37,558] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to hpu (auto detect)
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py:366: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/__init__.py:160: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
  warnings.warn(
[2024-11-02 09:56:37,974] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to hpu (auto detect)
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2024-11-02 09:56:38,367] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-11-02 09:56:38,459] [INFO] [comm.py:637:init_distributed] cdb=None
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/__init__.py:160: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
  warnings.warn(
[2024-11-02 09:56:38,648] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to hpu (auto detect)
/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py:366: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn(
[2024-11-02 09:56:39,270] [INFO] [comm.py:637:init_distributed] cdb=None
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2024-11-02 09:56:39,575] [INFO] [comm.py:637:init_distributed] cdb=None
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/__init__.py:160: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
  warnings.warn(
[2024-11-02 09:56:39,862] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to hpu (auto detect)
11/02/2024 09:56:40 - INFO - __main__ - DeepSpeed is enabled.
[2024-11-02 09:56:40,444] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-11-02 09:56:40,444] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend hccl
[2024-11-02 09:56:41,677] [INFO] [comm.py:637:init_distributed] cdb=None
Loading 7 checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:02<00:00,  2.42it/s]
Loading 7 checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  3.99it/s]
Loading 7 checkpoint shards:  57%|█████████████████████████████████████████████████████████████████████████▏                                                      | 4/7 [00:00<00:00,  6.08it/s][2024-11-02 09:56:51,251] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.14.0+hpu.synapse.v1.17.0, git-hash=a658791, git-branch=1.17.0
[2024-11-02 09:56:51,253] [INFO] [logging.py:96:log_dist] [Rank 0] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
Loading 7 checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  5.92it/s]
Loading 7 checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  5.36it/s]
Loading 7 checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00,  7.01it/s]
Loading 7 checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  6.53it/s]
Loading 7 checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  6.46it/s]
Loading 7 checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  5.56it/s]
============================= HABANA PT BRIDGE CONFIGURATION =========================== 
 PT_HPU_LAZY_MODE = 1
 PT_RECIPE_CACHE_PATH = 
 PT_CACHE_FOLDER_DELETE = 0
 PT_HPU_RECIPE_CACHE_CONFIG = 
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_LAZY_ACC_PAR_MODE = 0
 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 96
CPU RAM       : 527938484 KB
------------------------------------------------------------------------------
11/02/2024 09:56:55 - INFO - __main__ - Args: Namespace(device='hpu', model_name_or_path='/root/workspace/model/meta-llama/Llama-3.1-8B/', bf16=False, max_new_tokens=100, max_input_tokens=0, batch_size=1, warmup=3, n_iterations=5, local_rank=0, use_kv_cache=True, use_hpu_graphs=True, dataset_name=None, column_name=None, do_sample=False, num_beams=1, top_k=None, penalty_alpha=None, trim_logits=False, seed=27, profiling_warmup_steps=0, profiling_steps=0, profiling_record_shapes=False, prompt=None, bad_words=None, force_words=None, assistant_model=None, peft_model=None, num_return_sequences=1, token=None, model_revision='main', attn_softmax_bf16=False, output_dir=None, bucket_size=-1, bucket_internal=False, dataset_max_samples=-1, limit_hpu_graphs=False, reuse_cache=False, verbose_workers=False, simulate_dyn_prompt=None, reduce_recompile=False, use_flash_attention=False, flash_attention_recompute=False, flash_attention_causal_mask=False, flash_attention_fast_softmax=False, book_source=False, torch_compile=False, ignore_eos=True, temperature=1.0, top_p=1.0, const_serialization_path=None, disk_offload=False, trust_remote_code=False, load_quantized_model=False, parallel_strategy='none', quant_config='', world_size=8, global_rank=0)
11/02/2024 09:56:55 - INFO - __main__ - device: hpu, n_hpu: 8, bf16: True
11/02/2024 09:56:55 - INFO - __main__ - Model initialization took 17.483s
11/02/2024 09:56:55 - INFO - __main__ - Graph compilation...
Warming up iteration 1/3
[rank4]: Traceback (most recent call last):
[rank4]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 692, in <module>
[rank4]:     main()
[rank4]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 461, in main
[rank4]:     generate(None, args.reduce_recompile)
[rank4]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 432, in generate
[rank4]:     outputs = model.generate(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank4]:     return func(*args, **kwargs)
[rank4]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/generation/utils.py", line 1287, in generate
[rank4]:     result = self._sample(
[rank4]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/generation/utils.py", line 2247, in _sample
[rank4]:     outputs = self(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank4]:     return self._call_impl(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank4]:     result = forward_call(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 724, in forward
[rank4]:     return wrapped_hpugraph_forward(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 597, in wrapped_hpugraph_forward
[rank4]:     outputs = orig_fwd(*args, **kwargs)
[rank4]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 1270, in forward
[rank4]:     outputs = self.model(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank4]:     return self._call_impl(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank4]:     result = forward_call(*args, **kwargs)
[rank4]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 1163, in forward
[rank4]:     layer_outputs = decoder_layer(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank4]:     return self._call_impl(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank4]:     result = forward_call(*args, **kwargs)
[rank4]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 857, in forward
[rank4]:     hidden_states, self_attn_weights, present_key_value = self.pre_attn(
[rank4]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 912, in pre_attn
[rank4]:     hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward(
[rank4]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 538, in pre_attn_forward
[rank4]:     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
[rank4]: RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
[rank3]: Traceback (most recent call last):
[rank3]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 692, in <module>
[rank3]:     main()
[rank3]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 461, in main
[rank3]:     generate(None, args.reduce_recompile)
[rank3]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 432, in generate
[rank3]:     outputs = model.generate(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank3]:     return func(*args, **kwargs)
[rank3]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/generation/utils.py", line 1287, in generate
[rank3]:     result = self._sample(
[rank3]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/generation/utils.py", line 2247, in _sample
[rank3]:     outputs = self(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank3]:     return self._call_impl(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank3]:     result = forward_call(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 724, in forward
[rank3]:     return wrapped_hpugraph_forward(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 597, in wrapped_hpugraph_forward
[rank3]:     outputs = orig_fwd(*args, **kwargs)
[rank3]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 1270, in forward
[rank3]:     outputs = self.model(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank3]:     return self._call_impl(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank3]:     result = forward_call(*args, **kwargs)
[rank3]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 1163, in forward
[rank3]:     layer_outputs = decoder_layer(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank3]:     return self._call_impl(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank3]:     result = forward_call(*args, **kwargs)
[rank3]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 857, in forward
[rank3]:     hidden_states, self_attn_weights, present_key_value = self.pre_attn(
[rank3]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 912, in pre_attn
[rank3]:     hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward(
[rank3]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 538, in pre_attn_forward
[rank3]:     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
[rank3]: RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 692, in <module>
[rank0]:     main()
[rank0]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 461, in main
[rank0]:     generate(None, args.reduce_recompile)
[rank0]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 432, in generate
[rank0]:     outputs = model.generate(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/generation/utils.py", line 1287, in generate
[rank0]:     result = self._sample(
[rank0]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/generation/utils.py", line 2247, in _sample
[rank0]:     outputs = self(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank0]:     result = forward_call(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 724, in forward
[rank0]:     return wrapped_hpugraph_forward(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 597, in wrapped_hpugraph_forward
[rank0]:     outputs = orig_fwd(*args, **kwargs)
[rank0]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 1270, in forward
[rank0]:     outputs = self.model(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank0]:     result = forward_call(*args, **kwargs)
[rank0]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 1163, in forward
[rank0]:     layer_outputs = decoder_layer(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank0]:     result = forward_call(*args, **kwargs)
[rank0]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 857, in forward
[rank0]:     hidden_states, self_attn_weights, present_key_value = self.pre_attn(
[rank0]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 912, in pre_attn
[rank0]:     hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward(
[rank0]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 538, in pre_attn_forward
[rank0]:     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
[rank0]: RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
[rank6]: Traceback (most recent call last):
[rank6]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 692, in <module>
[rank6]:     main()
[rank6]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 461, in main
[rank6]:     generate(None, args.reduce_recompile)
[rank6]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 432, in generate
[rank6]:     outputs = model.generate(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank6]:     return func(*args, **kwargs)
[rank6]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/generation/utils.py", line 1287, in generate
[rank6]:     result = self._sample(
[rank6]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/generation/utils.py", line 2247, in _sample
[rank6]:     outputs = self(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank6]:     return self._call_impl(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank6]:     result = forward_call(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 724, in forward
[rank6]:     return wrapped_hpugraph_forward(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 597, in wrapped_hpugraph_forward
[rank6]:     outputs = orig_fwd(*args, **kwargs)
[rank6]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 1270, in forward
[rank6]:     outputs = self.model(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank6]:     return self._call_impl(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank6]:     result = forward_call(*args, **kwargs)
[rank6]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 1163, in forward
[rank6]:     layer_outputs = decoder_layer(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank6]:     return self._call_impl(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank6]:     result = forward_call(*args, **kwargs)
[rank6]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 857, in forward
[rank6]:     hidden_states, self_attn_weights, present_key_value = self.pre_attn(
[rank6]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 912, in pre_attn
[rank6]:     hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward(
[rank6]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 538, in pre_attn_forward
[rank6]:     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
[rank6]: RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
[rank7]: Traceback (most recent call last):
[rank7]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 692, in <module>
[rank7]:     main()
[rank7]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 461, in main
[rank7]:     generate(None, args.reduce_recompile)
[rank7]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 432, in generate
[rank7]:     outputs = model.generate(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank7]:     return func(*args, **kwargs)
[rank7]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/generation/utils.py", line 1287, in generate
[rank7]:     result = self._sample(
[rank7]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/generation/utils.py", line 2247, in _sample
[rank7]:     outputs = self(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank7]:     result = forward_call(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 724, in forward
[rank7]:     return wrapped_hpugraph_forward(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 597, in wrapped_hpugraph_forward
[rank7]:     outputs = orig_fwd(*args, **kwargs)
[rank7]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 1270, in forward
[rank7]:     outputs = self.model(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank7]:     result = forward_call(*args, **kwargs)
[rank7]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 1163, in forward
[rank7]:     layer_outputs = decoder_layer(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank7]:     result = forward_call(*args, **kwargs)
[rank7]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 857, in forward
[rank7]:     hidden_states, self_attn_weights, present_key_value = self.pre_attn(
[rank7]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 912, in pre_attn
[rank7]:     hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward(
[rank7]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 538, in pre_attn_forward
[rank7]:     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
[rank7]: RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
[rank5]: Traceback (most recent call last):
[rank5]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 692, in <module>
[rank5]:     main()
[rank5]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 461, in main
[rank5]:     generate(None, args.reduce_recompile)
[rank5]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 432, in generate
[rank5]:     outputs = model.generate(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank5]:     return func(*args, **kwargs)
[rank5]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/generation/utils.py", line 1287, in generate
[rank5]:     result = self._sample(
[rank5]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/generation/utils.py", line 2247, in _sample
[rank5]:     outputs = self(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank5]:     return self._call_impl(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank5]:     result = forward_call(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 724, in forward
[rank5]:     return wrapped_hpugraph_forward(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 597, in wrapped_hpugraph_forward
[rank5]:     outputs = orig_fwd(*args, **kwargs)
[rank5]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 1270, in forward
[rank5]:     outputs = self.model(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank5]:     return self._call_impl(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank5]:     result = forward_call(*args, **kwargs)
[rank5]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 1163, in forward
[rank5]:     layer_outputs = decoder_layer(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank5]:     return self._call_impl(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank5]:     result = forward_call(*args, **kwargs)
[rank5]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 857, in forward
[rank5]:     hidden_states, self_attn_weights, present_key_value = self.pre_attn(
[rank5]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 912, in pre_attn
[rank5]:     hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward(
[rank5]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 538, in pre_attn_forward
[rank5]:     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
[rank5]: RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
[rank2]: Traceback (most recent call last):
[rank2]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 692, in <module>
[rank2]:     main()
[rank2]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 461, in main
[rank2]:     generate(None, args.reduce_recompile)
[rank2]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 432, in generate
[rank2]:     outputs = model.generate(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank2]:     return func(*args, **kwargs)
[rank2]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/generation/utils.py", line 1287, in generate
[rank2]:     result = self._sample(
[rank2]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/generation/utils.py", line 2247, in _sample
[rank2]:     outputs = self(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank2]:     return self._call_impl(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank2]:     result = forward_call(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 724, in forward
[rank2]:     return wrapped_hpugraph_forward(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 597, in wrapped_hpugraph_forward
[rank2]:     outputs = orig_fwd(*args, **kwargs)
[rank2]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 1270, in forward
[rank2]:     outputs = self.model(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank2]:     return self._call_impl(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank2]:     result = forward_call(*args, **kwargs)
[rank2]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 1163, in forward
[rank2]:     layer_outputs = decoder_layer(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank2]:     return self._call_impl(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank2]:     result = forward_call(*args, **kwargs)
[rank2]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 857, in forward
[rank2]:     hidden_states, self_attn_weights, present_key_value = self.pre_attn(
[rank2]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 912, in pre_attn
[rank2]:     hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward(
[rank2]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 538, in pre_attn_forward
[rank2]:     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
[rank2]: RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
[rank1]: Traceback (most recent call last):
[rank1]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 692, in <module>
[rank1]:     main()
[rank1]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 461, in main
[rank1]:     generate(None, args.reduce_recompile)
[rank1]:   File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 432, in generate
[rank1]:     outputs = model.generate(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank1]:     return func(*args, **kwargs)
[rank1]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/generation/utils.py", line 1287, in generate
[rank1]:     result = self._sample(
[rank1]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/generation/utils.py", line 2247, in _sample
[rank1]:     outputs = self(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank1]:     result = forward_call(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 724, in forward
[rank1]:     return wrapped_hpugraph_forward(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 597, in wrapped_hpugraph_forward
[rank1]:     outputs = orig_fwd(*args, **kwargs)
[rank1]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 1270, in forward
[rank1]:     outputs = self.model(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank1]:     result = forward_call(*args, **kwargs)
[rank1]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 1163, in forward
[rank1]:     layer_outputs = decoder_layer(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl
[rank1]:     result = forward_call(*args, **kwargs)
[rank1]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 857, in forward
[rank1]:     hidden_states, self_attn_weights, present_key_value = self.pre_attn(
[rank1]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 912, in pre_attn
[rank1]:     hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward(
[rank1]:   File "/root/workspace/github/optimum-habana/optimum/habana/transformers/models/llama/modeling_llama.py", line 538, in pre_attn_forward
[rank1]:     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
[rank1]: RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
[2024-11-02 09:57:01,886] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 14322
[2024-11-02 09:57:02,099] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 14323
[2024-11-02 09:57:02,100] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 14324
[2024-11-02 09:57:02,101] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 14325
[2024-11-02 09:57:02,102] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 14326
[2024-11-02 09:57:02,103] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 14327
[2024-11-02 09:57:02,103] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 14328
[2024-11-02 09:57:02,103] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 14329
[2024-11-02 09:57:02,104] [ERROR] [launch.py:322:sigkill_handler] ['/usr/bin/python3', '-u', 'run_generation.py', '--model_name_or_path', '/root/workspace/model/meta-llama/Llama-3.1-8B/', '--batch_size', '1', '--use_hpu_graphs', '--use_kv_cache', '--max_new_tokens', '100'] exits with return code = 1
[ERROR|distributed_runner.py:222] 2024-11-02 09:57:02,893 >> deepspeed --num_nodes 1 --num_gpus 8 --no_local_rank --master_port 29500 run_generation.py --model_name_or_path /root/workspace/model/meta-llama/Llama-3.1-8B/ --batch_size 1 --use_hpu_graphs --use_kv_cache --max_new_tokens 100  exited with status = 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant