-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Added judgment logic to support training with plain text data. #281
base: main
Are you sure you want to change the base?
Conversation
Thank you for your achievements. However, when image-text pair data and text-only data were included in the same batch, the following error occurred when running the code. |
Here's the situation: Whenever someone update the code on GitHub, this error inevitably occurs because the text data lacks corresponding tgt_sizes and cannot participate in the process of extracting image features. This part is defined within the Hugging Face model, not in this current repository. We need to add an additional precondition: as I mentioned here. We should add two lines there. As the huggingface merge has not been accepetd by official, we can only modify the code localy. |
Thanks for your fast reply, but i got same error..
|
So sorry, I actually not familiar with the previous version code, maybe you can try current version. oh i just notice that you load model from cache? maybe git clone the model first and then use your local model_path, it will be easy to modify code and debug. |
oh, that's ok. thank you so much for your help. |
Hey guys, are there any updates for this error?
I modified the codes in |
Sorry for late reply. Do you fix it now? |
in my code, i think because any 1 data has image in batch, if all_pixel_values: is True, so another text-only data in batch caught that error. i can't solve it. |
1 text-image pair data and 1 text-only data in my sample test, it seems tgt_sizes = torch.tensor([]) in text-only data. but An error occurred because the data entered the |
Hello, I noticed something. Your code in if 'vision_hidden_states' not in data:
dtype = self.vpm.embeddings.position_embedding.weight.dtype
device = self.vpm.embeddings.position_embedding.weight.device
tgt_sizes = data['tgt_sizes']
pixel_values_list = data['pixel_values']
vision_hidden_states = []
all_pixel_values = []
img_cnt = []
for pixel_values in pixel_values_list:
if len(pixel_values) == 0:
continue
img_cnt.append(len(pixel_values))
all_pixel_values.extend([i.flatten(end_dim=1).permute(1, 0) for i in pixel_values]) You might be concerned about the subsequent logic being incoherent, but in reality, that won't be the case. The script modeling_minicpmv.py is compatible with this logic. I may suggest to load model from local path (not from huggingface, just clone it to your disk), then we can observer or debug easily. |
thanks a lot for your feedback! i modified the code but sadly got the same error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以解决我的问题,感谢~
The current logic assumes that all input data includes image inputs, so data['pixel_values'] must match the training samples; however, if dealing with purely text data inputs, 'pixel_values' does not exist.
Here, we need to simply process the dataset to make it compatible with text input; at the same time, we need to perform an additional huggingface model merge at the model.
This addresses the following two issues, which I understand are essentially the same problem mentioned here.
#221 #250
@Cuiunbo