You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks. I also had another question. in the paper you mentioned:
To deal with images with varied aspect ratios, LLaVA-1.5 pads the input images into squares before feeding them into the visual encoder. This encoding method results in a waste of computation for non-square images. For example, a 1:4 image has only 25% effective computation after padding into squares. To quantify the influence, we train an unpadded version of LLaVA-1.5, by fitting the ViT position embedding into the aspect ratio of input images using 2D interpolation.
based on your train.sh in your pretraining you didn't use image_aspect_ratio and in the FT you used image_aspect_ratio pad. Are you referring to that in your paper?
Hi I am gettinf this warning. is that ok?
The text was updated successfully, but these errors were encountered: