Skip to content

Latest commit

 

History

History
87 lines (64 loc) · 3.05 KB

file-formats.md

File metadata and controls

87 lines (64 loc) · 3.05 KB

File Formats

unpaper gets its file input and output code support from the libav library, but not all image file formats supported by the library are supported by unpaper.

The reason for the complexity is to be found in the different pixel formats that need to be supported. At the time of writing, the supported pixel formats are gray8, rgb24, monoblack, monowhite, y400a (aka ya8) and pal8 (caveat emptor, see notes in the output formats).

If you have unsupported files that you think should be supported (for instance because they are generated by some scanning tool or hardware), please open an issue.

Output Formats

unpaper still supports only output of files in any of the supported PNM formats. Support for different formats will likely follow in future major versions.

As it is, the output format will try to match the pixel format of the source material, so for a gray8 or ya8 file, the output will be pgm, while for a rgb24 it'll be a ppm. Both monoblack and monowhite will output a pbm.

Because of the way palettes are implemented, an input file in pal8 format will output ppm files by default. At the time of writing, this include all grayscale TIFF files with libav versions preceding 11.

Input Formats

PNM Family

The PNM family of formats is the original file format supported by unpaper and it includes pbm (Portable Bit Map), pgm (Portable Grayscale Map) and ppm (Portable Pixel Map) formats. It is used by Linux scanning tools such as scanimage and scanadf.

Input support is limited to the most common sub-formats:

  • black and white (pbm);
  • grayscale up to 8-bit (pgm);
  • RGB colour, up to 24-bit (ppm).

Both grayscale and RGB colour images allows a definition of a MAXVAL property that defines the depth of the image. Images at a lower depth than those supported (8-bit for grayscale, 24-bit for RGB), will be upconverted. Images at a higher depth are not supported.

While the PNM family supports YUV images, these are not supported by unpaper and no plan is currently out to support them.

TIFF

The TIFF format consists of a long list of sub-formats that are not compatible with each other, so input files in TIFF file formats might or might not be supported depending on the version of unpaper and libav.

At the time of writing, libav 9 and 10 will treat all 8-bit grayscale files as pal8, which will then be considered 24-bit RGB instead. This is fixed in version 11 of libav.

Version 11 of libav also introduces support for images at 8-bit plus alpha, as well as (not yet supported by unpaper) 16-bit plus alpha and 48-bit RGB.

Notable missing features at the time of writing, for libav master are JPEG and LZMA compression, 4-bit grayscale images, and multi-pages files.

PDF Generation

PDF support is not currently available and not planned. You can generate PDFs out of the processed images by using tools such as pnm2tiff, tiffcp and tiff2pdf.