PPTX: Extract images #56

pstoeckle · 2024-12-16T09:27:55Z

Currently, the script only extracts placeholders, i.e.,

![Content Placeholder 16](ContentPlaceholder16.jpg)

It would be nice if the tool would export the images (to the current folder or a folder passed as argument).
Thus, one could see the images in the markdown preview as well.

AlbanOtt2 · 2024-12-16T14:39:44Z

Note: This issue also exists for Word (.docx) documents.

A potential improvement would be to export images in a format like ![](media/image1.png).
This would allow users to simply unzip their Word document and retrieve the image from where Word stores it: media/image1.png.

It would also enable the possiblity to automate this example:

markitdown/README.md

Line 50 in 81e3f24

result = md.convert("example.jpg")

afourney · 2024-12-16T18:33:27Z

Yes, better handling of images is on my to-do list. The original purpose of the library was to support text-only LLMs, so the original focus was on extracting image metadata (e.g., tags, xmp, iptc, captions, etc.). But there's clear value in saving the images to disk and supporting them directly.

dradoudine · 2025-01-07T09:55:19Z

would be good to have an option to export media (images....) to a specific folder (as done in pandoc)

masquare · 2025-01-29T08:53:19Z

FYI and related to this issue: I've created PR #306, which describes images in PPTX files using LLMs

gagb added enhancement New feature or request open for contribution Invites open-source developers to contribute to the project. labels Dec 17, 2024

gagb mentioned this issue Dec 17, 2024

Both the image in docx and pdf will not be converted to base64 encoded content #58

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPTX: Extract images #56

PPTX: Extract images #56

pstoeckle commented Dec 16, 2024

AlbanOtt2 commented Dec 16, 2024

afourney commented Dec 16, 2024

dradoudine commented Jan 7, 2025

masquare commented Jan 29, 2025

PPTX: Extract images #56

PPTX: Extract images #56

Comments

pstoeckle commented Dec 16, 2024

AlbanOtt2 commented Dec 16, 2024

afourney commented Dec 16, 2024

dradoudine commented Jan 7, 2025

masquare commented Jan 29, 2025