You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cd /root/workspace/github/optimum-habana/examples/text-generation/
python run_generation.py \
--model_name_or_path /root/workspace/model/meta-llama/Llama-3.1-8B/ \
--use_hpu_graphs \
--use_kv_cache \
--max_new_tokens 100 \
--do_sample \
--prompt "Here is my prompt"
/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py:366: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
11/02/2024 10:02:30 - INFO - __main__ - Single-device run.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 9.44it/s]
Some weights of GaudiLlamaForCausalLM were not initialized from the model checkpoint at /root/workspace/model/meta-llama/Llama-3.1-8B/ and are newly initialized: ['lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
============================= HABANA PT BRIDGE CONFIGURATION ===========================
PT_HPU_LAZY_MODE = 1
PT_RECIPE_CACHE_PATH =
PT_CACHE_FOLDER_DELETE = 0
PT_HPU_RECIPE_CACHE_CONFIG =
PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
PT_HPU_LAZY_ACC_PAR_MODE = 1
PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 96
CPU RAM : 527938484 KB
------------------------------------------------------------------------------
Traceback (most recent call last):
File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 692, in <module>
main()
File "/root/workspace/github/optimum-habana/examples/text-generation/run_generation.py", line 337, in main
model, assistant_model, tokenizer, generation_config = initialize_model(args, logger)
File "/root/workspace/github/optimum-habana/examples/text-generation/utils.py", line 633, in initialize_model
setup_model(args, model_dtype, model_kwargs, logger)
File "/root/workspace/github/optimum-habana/examples/text-generation/utils.py", line 267, in setup_model
model = model.eval().to(args.device)
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2871, in to
return super().to(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1176, in to
return self._apply(convert)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 779, in _apply
module._apply(fn)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 779, in _apply
module._apply(fn)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 779, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 804, in _apply
param_applied = fn(param)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1162, in convert
return t.to(
RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_DEVMEM Allocation failed for size::234881024 (224)MB
The text was updated successfully, but these errors were encountered:
The text was updated successfully, but these errors were encountered: