Skip to content
This repository has been archived by the owner on Jul 7, 2020. It is now read-only.

Support DCTDecodeFilter, which is no-op, actually #15

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions read.go
Original file line number Diff line number Diff line change
Expand Up @@ -839,6 +839,11 @@ func applyFilter(rd io.Reader, name string, param Value) io.Reader {
case 12:
return &pngUpReader{r: zr, hist: make([]byte, 1+columns), tmp: make([]byte, 1+columns)}
}
case "DCTDecode":
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// DCTDecode indicates that the Image XObject data is a full JPEG encoded image, so we return the original reader as is, and leave it up to the caller to decode the image.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. thanks. Done.

// DCTDecode indicates that the Image XObject data is a full JPEG
// encoded image, so we return the original reader as is, and leave it
// up to the caller to decode the image.
return rd
}
}

Expand Down
41 changes: 41 additions & 0 deletions read_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
package pdf_test

import (
"image/jpeg"
"testing"

"rsc.io/pdf"
)

func TestReaderExtractXObjectDCTDecode(t *testing.T) {
const (
testscan = "testdata/testscan.pdf"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

50K seems a bit large for such a basic test imho. It does not have to be a real scan coming from your scanner, as long as it is structured the same. Is it difficult to forge a pdf with a an image of less than 1K in it?
Examples of tiny images: camlistore.org/pkg/images/testdata

)
f, err := pdf.Open(testscan)
if err != nil {
t.Fatalf("could not open %v: %v", testscan, err)
}
x := f.Page(1).Resources().Key("XObject")
if x.Kind() != pdf.Dict || len(x.Keys()) == 0 {
t.Fatalf("no xobject dict on page 1")
}
k := x.Key(x.Keys()[0])
if k.IsNull() || k.Kind() != pdf.Stream || k.Key("Subtype").Name() != "Image" {
t.Fatalf("first xobject child is not an image stream")
}
defer func() {
if r := recover(); r != nil {
s, ok := r.(string)
if ok && s == "unknown filter DCTDecode" {
t.Fatalf("DCTDecode filter handling is not implemented")
}
panic(r) // re-panic everything else
}
}()
rc := k.Reader()
defer rc.Close()
_, err = jpeg.Decode(rc)
if err != nil {
t.Fatalf("could not decode embedded JPEG: %v", err)
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably check that the decoded image is what we expect. we'd usually check it pixel by pixel but that might be overkill here, so it's probably ok to e.g. check the hashsum of its bytes against a hardcoded hashsum of what we know to be the initial image.

Binary file added testdata/testscan.pdf
Binary file not shown.