-
Notifications
You must be signed in to change notification settings - Fork 909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] <title>LoRA 微调问题 #342
Comments
你的数据集目前是什么样子的 |
JSON 文件里是一整个list,list里面是一个个的字典,每个字典包含“ID”, “image” 是图片的路径,“conversations” 是一个包含对话的list |
具体看一下conversations |
{ |
很奇怪的是当我在AutoModel.from_pretrained() 后面加上.to("cuda")后这个bug就没了,但是出现了cuda out of memory的bug |
同时我还想请教下另一个问题,目前我一台服务器上有8张NVIDIA GPU 显卡但是只能使用其中的7张(除了index0那张)我要修改finetune.py 这个文件以及finetune_lora.sh这个文件里面的分布式训练代码吗🤔 |
还是得解决最开始的问题😂 |
你的训练文件里面没有加这个token,请参照这个组织你的数据https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune#data-preparation |
你的显卡是什么 |
I also get "RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation" when running the Lora script. I made sure I have the data in the desired format. Attached is the output I get when running the command: output_lora.txt FWIW, I also tried with this version of finetune.py and I get the same error I get on current main branch of the official repo. Regarding deps, I ran the ones in requirements.txt + the pinned versions mentioned in this PR. I use only one machine (no parallelism). Here is the
|
Hi, we update the code please try again. |
加上token也没用还是原来的error。 |
用了最近的代码,产生了一个新的error |
模型我用了从modelscope下载的本地缓存,然后在finetune.py文件中修改了模型的导入,改为使用modescope中的AutoModel and AutoTokenizer然后加在本地模型缓存,这会有影响吗 |
我建议你直接拿huggingface的重新下载一遍 |
I can confirm it's working now with "--bf16 true --bf16_full_eval true --fp16 false --fp16_full_eval false". Initially I tried fp16 and I got an error saying "Attempting to unscale FP16 gradients" so I switched to BF16. Thank you for the fix! |
You are welcome! |
嵌入的问题解决了,但是huggingface和modescope的模型难道有细微不同吗?; |
不对,我目前没有使用任何GPU,但还是报错cuda out of memory |
okok 我解决了 |
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
在LoRA过程中遇到了image start token != image end tokens 以及
UserWarning: None of the inputs have requires_grad=True
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation
我想应该和准备的数据集有点关系,请教一下该怎么修复呢, 提前感谢!
期望行为 | Expected Behavior
No response
复现方法 | Steps To Reproduce
No response
运行环境 | Environment
备注 | Anything else?
No response
The text was updated successfully, but these errors were encountered: