If there are not only texts but also pictures in the PDF file I uploaded, can these pictures be recognized and generated as separate pictures? #109

Wyzanezan · 2024-11-25T14:11:53Z

No description provided.

tylermaran · 2024-11-25T15:52:08Z

Hey @Wyzanezan. Can you share an example of the type of page you're thinking of?

Right now we're not isolating images out of documents, but it's been something of interest to people. Today we'd just get [image description](image)

Wyzanezan · 2024-11-27T03:29:17Z

Hi @tylermaran , the image above is part of a PDF page. When I use zerox, I hope zerox can recognize it as a picture and extract the picture and put it in a separate directory, but zerox can't do it at present.

ezavesky · 2024-11-29T14:48:12Z

Hey @Wyzanezan. Can you share an example of the type of page you're thinking of?

Right now we're not isolating images out of documents, but it's been something of interest to people. Today we'd just get [image description](image)

Is there a specific modification of the core prompt that would be required to generate a caption. For example, the first hope would be to generate a rich description of the content and a second would be a sufficient prompt to re-generate if necessary.

As an example, a similar project takes this input of a bar chart and produces this brief markdown. At this point, there's no comments on the accuracy of the image but if it's going through some of the big API-based GPTs (or maybe even llava or qwen?) they should do a decent recognition and description.

Oh, and big thanks for sharing your work and providing a great starter that others can enhance for this functionality!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If there are not only texts but also pictures in the PDF file I uploaded, can these pictures be recognized and generated as separate pictures? #109

If there are not only texts but also pictures in the PDF file I uploaded, can these pictures be recognized and generated as separate pictures? #109

Wyzanezan commented Nov 25, 2024

tylermaran commented Nov 25, 2024

Wyzanezan commented Nov 27, 2024

ezavesky commented Nov 29, 2024

If there are not only texts but also pictures in the PDF file I uploaded, can these pictures be recognized and generated as separate pictures? #109

If there are not only texts but also pictures in the PDF file I uploaded, can these pictures be recognized and generated as separate pictures? #109

Comments

Wyzanezan commented Nov 25, 2024

tylermaran commented Nov 25, 2024

Wyzanezan commented Nov 27, 2024

ezavesky commented Nov 29, 2024