Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If there are not only texts but also pictures in the PDF file I uploaded, can these pictures be recognized and generated as separate pictures? #109

Open
Wyzanezan opened this issue Nov 25, 2024 · 3 comments

Comments

@Wyzanezan
Copy link

No description provided.

@tylermaran
Copy link
Contributor

Hey @Wyzanezan. Can you share an example of the type of page you're thinking of?

Right now we're not isolating images out of documents, but it's been something of interest to people. Today we'd just get [image description](image)

@Wyzanezan
Copy link
Author

image
Hi @tylermaran , the image above is part of a PDF page. When I use zerox, I hope zerox can recognize it as a picture and extract the picture and put it in a separate directory, but zerox can't do it at present.

@ezavesky
Copy link

Hey @Wyzanezan. Can you share an example of the type of page you're thinking of?

Right now we're not isolating images out of documents, but it's been something of interest to people. Today we'd just get [image description](image)

Is there a specific modification of the core prompt that would be required to generate a caption. For example, the first hope would be to generate a rich description of the content and a second would be a sufficient prompt to re-generate if necessary.

As an example, a similar project takes this input of a bar chart and produces this brief markdown. At this point, there's no comments on the accuracy of the image but if it's going through some of the big API-based GPTs (or maybe even llava or qwen?) they should do a decent recognition and description.

Oh, and big thanks for sharing your work and providing a great starter that others can enhance for this functionality!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants