Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excludes Image objects when assembling plaintext content to write. #25

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

jtkiley
Copy link

@jtkiley jtkiley commented Aug 8, 2014

Fixes #24.

Obviously, this is the simple fix. When I looked at stopping Image from inheriting from Paragraph, I didn't get errors (and without this change, I still got the image hex in files). I'm still a little fuzzy on the finer points of the RTF spec and the reader's logic, so I probably need to clear that up before working on Image.

@watercrossing
Copy link
Contributor

That will do the trick, even though its a bit hackish... I don't know if people would like this but it might be useful to include a snippet: {Image stripped, 123 bytes} or some other information to the text file explaining that an image used to be here?

@jtkiley
Copy link
Author

jtkiley commented Aug 19, 2014

I agree that it's a specific and not-at-all pretty fix. I'm just not familiar enough with pyth and the finer points of the RTF format to intelligently make changes to the design.

As for the snippet, I do a lot of content analysis, and I use pyth to process RTFs into plain text. It's probably my specific research use case, but I'm wary of adding text into a document. Also, the images in my documents are an artifact of the data provider (not the original data). It may be a good option, though. If I were looking at documents with "real" embedded images, being able to capture that fact might lead to interesting results. I would guess that a lot of use cases would similarly be interested in at least knowing about images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hex (encoded images) in \pict control groups is not removed.
2 participants