Support DCTDecodeFilter, which is no-op, actually #15

fawick · 2017-07-09T10:14:05Z

DCTDecode means that the blob is a raw jpeg which can be read right away.

cf. https://blog.idrsolutions.com/2011/07/extract-raw-jpeg-images-from-a-pdf-file/

mpl · 2017-08-01T16:00:33Z

I'd suggest adding a tiny PDF in a testdata dir, with such an embedded image, plus a decoding test, but I see the lib does not have any tests at all, so maybe that's not wanted...

mpl · 2017-08-01T16:39:55Z

read.go

@@ -839,6 +839,8 @@ func applyFilter(rd io.Reader, name string, param Value) io.Reader {
 		case 12:
 			return &pngUpReader{r: zr, hist: make([]byte, 1+columns), tmp: make([]byte, 1+columns)}
 		}
+	case "DCTDecode":


// DCTDecode indicates that the Image XObject data is a full JPEG encoded image, so we return the original reader as is, and leave it up to the caller to decode the image.

Good idea. thanks. Done.

mpl · 2017-08-01T16:45:51Z

well, @bradfitz suggested a test too, so maybe that change can also be the one that introduces the first test.

mpl · 2017-08-01T16:48:07Z

@josharian can we humbly ping you to have a look at this PR please?

josharian · 2017-08-01T17:01:23Z

Tests are definitely welcome. I looked at the change, and it seems fine to me, but I really know ~0 about PDFs, and I'm not at a point now where I want to learn enough to take ownership of this repo. Sorry.

@mpl

Comment was suggested and worded by @mpl.

fawick · 2017-08-02T05:26:32Z

Okay, I will supply a test tonight.

@bradfitz, @mpl Given all the open PRs, is it worth considering to fork and maintain this package under go4.org?

mpl · 2017-08-11T16:27:10Z

read_test.go

+
+func TestReaderExtractXObjectDCTDecode(t *testing.T) {
+	const (
+		testscan = "testdata/testscan.pdf"


50K seems a bit large for such a basic test imho. It does not have to be a real scan coming from your scanner, as long as it is structured the same. Is it difficult to forge a pdf with a an image of less than 1K in it?
Examples of tiny images: camlistore.org/pkg/images/testdata

mpl · 2017-08-11T16:29:54Z

read_test.go

+	if err != nil {
+		t.Fatalf("could not decode embedded JPEG: %v", err)
+	}
+}


we should probably check that the decoded image is what we expect. we'd usually check it pixel by pixel but that might be overkill here, so it's probably ok to e.g. check the hashsum of its bytes against a hardcoded hashsum of what we know to be the initial image.

mpl · 2017-08-11T16:40:13Z

@fawick to answer your question about forking: given that nobody seems to be willing to own that repo, then yes, it probably will come to this. I doubt it'll be on go4.org though. As far as Camlistore is concerned, simply vendoring github.com/fawick/pdf (which is in effect similar to what you were proposing initially) is probably the way to go.

Let's finish the review on here first though.

mpl suggested changes Aug 1, 2017

View reviewed changes

fawick force-pushed the master branch from 0c47e6b to 3f4d78b Compare August 2, 2017 05:19

Support DCTDecodeFilter, which is no-op, actually

570434f

Comment was suggested and worded by @mpl.

fawick force-pushed the master branch from 3f4d78b to 570434f Compare August 2, 2017 05:22

fawick added 2 commits August 2, 2017 08:25

Added small PDF with scanned page

2469587

Add test for decoding embedded JPEG

5190dcf

mpl reviewed Aug 11, 2017

View reviewed changes

Merge branch 'rsc:master' into master

8a58b60

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support DCTDecodeFilter, which is no-op, actually #15

Support DCTDecodeFilter, which is no-op, actually #15

fawick commented Jul 9, 2017 •

edited

Loading

mpl commented Aug 1, 2017

mpl Aug 1, 2017

fawick Aug 2, 2017

mpl commented Aug 1, 2017

mpl commented Aug 1, 2017

josharian commented Aug 1, 2017

fawick commented Aug 2, 2017

mpl Aug 11, 2017

mpl Aug 11, 2017

mpl commented Aug 11, 2017

Support DCTDecodeFilter, which is no-op, actually #15

Are you sure you want to change the base?

Support DCTDecodeFilter, which is no-op, actually #15

Conversation

fawick commented Jul 9, 2017 • edited Loading

mpl commented Aug 1, 2017

mpl Aug 1, 2017

Choose a reason for hiding this comment

fawick Aug 2, 2017

Choose a reason for hiding this comment

mpl commented Aug 1, 2017

mpl commented Aug 1, 2017

josharian commented Aug 1, 2017

fawick commented Aug 2, 2017

mpl Aug 11, 2017

Choose a reason for hiding this comment

mpl Aug 11, 2017

Choose a reason for hiding this comment

mpl commented Aug 11, 2017

fawick commented Jul 9, 2017 •

edited

Loading