Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] <title>llama factory sft训练报错 #862

Open
2 tasks done
hm1229 opened this issue Feb 27, 2025 · 0 comments
Open
2 tasks done

[BUG] <title>llama factory sft训练报错 #862

hm1229 opened this issue Feb 27, 2025 · 0 comments

Comments

@hm1229
Copy link

hm1229 commented Feb 27, 2025

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

我遵循了https://github.com/OpenBMB/MiniCPM-o/blob/main/docs/llamafactory_train_and_infer.md给出的lora sft.yaml,并按照https://github.com/OpenBMB/MiniCPM-o/issues/807给出的环境进行了安装,在lora-sft时发生了下列报错,怀疑是accelerate库版本的问题?
/data3/utils/LLaMA-Factory/src/llamafactory/data/mm_plugin.py:669: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
audio_feature_lens = [torch.tensor(audio_feature_len) for audio_feature_len in audio_feature_lens]
Traceback (most recent call last):
File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/accelerate/utils/operations.py", line 156, in send_to_device
return tensor.to(device, non_blocking=non_blocking)
TypeError: BatchEncoding.to() got an unexpected keyword argument 'non_blocking'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/data1/anaconda3/envs/minitrain/bin/llamafactory-cli", line 8, in
sys.exit(main())
File "/data3/utils/LLaMA-Factory/src/llamafactory/cli.py", line 112, in main
run_exp()
File "/data3/utils/LLaMA-Factory/src/llamafactory/train/tuner.py", line 93, in run_exp
_training_function(config={"args": args, "callbacks": callbacks})
File "/data3/utils/LLaMA-Factory/src/llamafactory/train/tuner.py", line 67, in _training_function
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/data3/utils/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 102, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/transformers/trainer.py", line 2052, in train
return inner_training_loop(
File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/transformers/trainer.py", line 2345, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/accelerate/data_loader.py", line 561, in iter
current_batch = send_to_device(current_batch, self.device, non_blocking=self._non_blocking)
File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/accelerate/utils/operations.py", line 184, in send_to_device
{
File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/accelerate/utils/operations.py", line 185, in
k: t if k in skip_keys else send_to_device(t, device, non_blocking=non_blocking, skip_keys=skip_keys)
File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/accelerate/utils/operations.py", line 158, in send_to_device
return tensor.to(device)
File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 816, in to
self.data = {k: v.to(device=device) for k, v in self.data.items() if v is not None}
File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 816, in
self.data = {k: v.to(device=device) for k, v in self.data.items() if v is not None}
AttributeError: 'list' object has no attribute 'to'
Traceback (most recent call last):
File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/accelerate/utils/operations.py", line 156, in send_to_device
return tensor.to(device, non_blocking=non_blocking)
TypeError: BatchEncoding.to() got an unexpected keyword argument 'non_blocking'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/data1/anaconda3/envs/minitrain/bin/llamafactory-cli", line 8, in
sys.exit(main())
File "/data3/utils/LLaMA-Factory/src/llamafactory/cli.py", line 112, in main
run_exp()
File "/data3/utils/LLaMA-Factory/src/llamafactory/train/tuner.py", line 93, in run_exp
_training_function(config={"args": args, "callbacks": callbacks})
File "/data3/utils/LLaMA-Factory/src/llamafactory/train/tuner.py", line 67, in _training_function
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/data3/utils/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 102, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/transformers/trainer.py", line 2052, in train
return inner_training_loop(
File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/transformers/trainer.py", line 2345, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/accelerate/data_loader.py", line 561, in iter
current_batch = send_to_device(current_batch, self.device, non_blocking=self._non_blocking)
File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/accelerate/utils/operations.py", line 184, in send_to_device
{
File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/accelerate/utils/operations.py", line 185, in
k: t if k in skip_keys else send_to_device(t, device, non_blocking=non_blocking, skip_keys=skip_keys)
File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/accelerate/utils/operations.py", line 158, in send_to_device
return tensor.to(device)
File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 816, in to
self.data = {k: v.to(device=device) for k, v in self.data.items() if v is not None}
File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 816, in
self.data = {k: v.to(device=device) for k, v in self.data.items() if v is not None}
AttributeError: 'list' object has no attribute 'to'

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

所有环境如下piplist.txt

运行环境 | Environment

- OS:Ubuntu 20.04
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant