Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPTX: Extract images #56

Open
pstoeckle opened this issue Dec 16, 2024 · 4 comments
Open

PPTX: Extract images #56

pstoeckle opened this issue Dec 16, 2024 · 4 comments
Labels
enhancement New feature or request open for contribution Invites open-source developers to contribute to the project.

Comments

@pstoeckle
Copy link

Currently, the script only extracts placeholders, i.e.,

![Content Placeholder 16](ContentPlaceholder16.jpg)

It would be nice if the tool would export the images (to the current folder or a folder passed as argument).
Thus, one could see the images in the markdown preview as well.

Image

@AlbanOtt2
Copy link

Note: This issue also exists for Word (.docx) documents.

A potential improvement would be to export images in a format like ![](media/image1.png).
This would allow users to simply unzip their Word document and retrieve the image from where Word stores it: media/image1.png.

It would also enable the possiblity to automate this example:

result = md.convert("example.jpg")

@afourney
Copy link
Member

Yes, better handling of images is on my to-do list. The original purpose of the library was to support text-only LLMs, so the original focus was on extracting image metadata (e.g., tags, xmp, iptc, captions, etc.). But there's clear value in saving the images to disk and supporting them directly.

@gagb gagb added enhancement New feature or request open for contribution Invites open-source developers to contribute to the project. labels Dec 17, 2024
@dradoudine
Copy link

would be good to have an option to export media (images....) to a specific folder (as done in pandoc)

@masquare
Copy link

FYI and related to this issue: I've created PR #306, which describes images in PPTX files using LLMs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request open for contribution Invites open-source developers to contribute to the project.
Projects
None yet
Development

No branches or pull requests

6 participants