-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hex (encoded images) in \pict
control groups is not removed.
#24
Comments
Alright, so this is from this pull request, which I perhaps didn't think hard enough about: https://github.com/brendonh/pyth/pull/19/files The stated purpose of that PR is to make it easier to filter out image data, by identifying it as such in the document object. But then none of the writers actually filter it out. Sigh. I figure the right fix is to make Image a top-level type (instead of a Paragraph subclass), and then update the writers to ignore it. Or something. |
Having looked again, it looks like the images are getting recognized. I see image objects inside of paragraphs, so it may be as easy just having the writers ignore it. |
Right. So there are two bugs:
Someone should fix that! ;-) |
@brendonh You should have highlighted me, since I wrote that original image support... I am not sure making an Image a top level class is the correct approach, since images actually appear in the flow of the paragraphs, and currently pyth checks religiously that a Document only contains a Paragraph. Before my patch #19 Image data would just be interpreted as plain text, and so the output of all the writers hasn't changed. I currently think adding functionality to the writers to ignore/handle the images is the best way forward. |
Ugh you are right on both counts. Okay, I'll do something about it. |
When I read an RTF and write it out as plain text (both with pyth), all of the hex for embedded images is included in the document. As expected, the
\pict
control group itself is gone.At the moment, I'm preprocessing these files to wipe out the
pict
group (hex included) before using pyth, but, of course, it would be nice to avoid that. I'm not familiar enough with RTF versions to know if this is part of the 1.5 spec or a later one. However, these files run perfectly otherwise.I can send you an example, if needed.
The text was updated successfully, but these errors were encountered: