-
Notifications
You must be signed in to change notification settings - Fork 335
Support DCTDecodeFilter, which is no-op, actually #15
base: master
Are you sure you want to change the base?
Conversation
I'd suggest adding a tiny PDF in a testdata dir, with such an embedded image, plus a decoding test, but I see the lib does not have any tests at all, so maybe that's not wanted... |
@@ -839,6 +839,8 @@ func applyFilter(rd io.Reader, name string, param Value) io.Reader { | |||
case 12: | |||
return &pngUpReader{r: zr, hist: make([]byte, 1+columns), tmp: make([]byte, 1+columns)} | |||
} | |||
case "DCTDecode": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// DCTDecode indicates that the Image XObject data is a full JPEG encoded image, so we return the original reader as is, and leave it up to the caller to decode the image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. thanks. Done.
well, @bradfitz suggested a test too, so maybe that change can also be the one that introduces the first test. |
@josharian can we humbly ping you to have a look at this PR please? |
Tests are definitely welcome. I looked at the change, and it seems fine to me, but I really know ~0 about PDFs, and I'm not at a point now where I want to learn enough to take ownership of this repo. Sorry. |
Comment was suggested and worded by @mpl.
|
||
func TestReaderExtractXObjectDCTDecode(t *testing.T) { | ||
const ( | ||
testscan = "testdata/testscan.pdf" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
50K seems a bit large for such a basic test imho. It does not have to be a real scan coming from your scanner, as long as it is structured the same. Is it difficult to forge a pdf with a an image of less than 1K in it?
Examples of tiny images: camlistore.org/pkg/images/testdata
if err != nil { | ||
t.Fatalf("could not decode embedded JPEG: %v", err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably check that the decoded image is what we expect. we'd usually check it pixel by pixel but that might be overkill here, so it's probably ok to e.g. check the hashsum of its bytes against a hardcoded hashsum of what we know to be the initial image.
@fawick to answer your question about forking: given that nobody seems to be willing to own that repo, then yes, it probably will come to this. I doubt it'll be on go4.org though. As far as Camlistore is concerned, simply vendoring github.com/fawick/pdf (which is in effect similar to what you were proposing initially) is probably the way to go. Let's finish the review on here first though. |
DCTDecode means that the blob is a raw jpeg which can be read right away.
cf. https://blog.idrsolutions.com/2011/07/extract-raw-jpeg-images-from-a-pdf-file/