-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support array of content streams and parse them as a single stream #36
base: master
Are you sure you want to change the base?
support array of content streams and parse them as a single stream #36
Conversation
The page object structure supports either a stream or an array in the `Contents` key. In the latter case, we need concat the contents of the individual streams.
seems to fix: potentially fixes: the issue in #15 seems to be a broken xref table in the pdf which can be fixed with mutools. |
As far as my tests are representative, it now also works with
|
tok := b.readToken() | ||
if tok == io.EOF { | ||
break | ||
s := strm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe there is a more GO like way of doing this without repeating code.
I'm open for suggestions.
I can confirm that it fixed Great job @romanpickl ! |
can this be merged? |
Applies patch: ledongthuc#36
Support array of content streams.
this is based on some work in
Contents
key #32but, as far as I understand the spec, the streams have to be concatenated before they are interpreted i.e. they have to be interpreted as a single stream. Otherwise some streams might be invalid as they depend on e.g. BT ops being opened in the previous stream.
https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf (page 78)
This draft includes some duplication that could be refactored, but I don't have the full visibility of the code base (yet).
In addition I think that other functions like walkTextBlocks might have to be updated as well.