Understanding the PDF File format – images

Images are not stored inside a PDF file as Tiff or PNG or JPG images. They are stored as the binary pixel data along with the Colorspace used by that data. This allows a lot of flexibilty. For example, a CMYK image can be stored as a block of binary data (4 bytes for each pixel) and a specified as using a CMYKColorspace. The actual image data can be compressed in different ways to best suit the data (DCT for colour images, CCITT or JBIG2 for black and white 1 bit images). The image is scaled to fit the slot of the page so it can often be of a higher resolution.

There are 2 image commands for drawing images (ID and DO). The ID command allows the binary image data to be embedded in the command stream. This is not as flexible as the DO command which stores the image in a separate PDF object of type XObject or XForm. So the DO command tells to be far more common. It allows better data compression, offers more functionality and you can edit the image object without having to alter the command stream.

Each image has a name (like Im4). In the stream, you would see the command

/Im4

DO

which draws the image at this point with the current graphics Matrix.

The actual image IM4 is defined in a separate object which is listed in the Resources table. In this case it is Object 20 0 R.

XObject<</Im4 20 0 R/Im3 21 0 R>>

Object 20 contains the information on the image and the compressed binary pixel data

20 0 obj <<

/Filter/DCTDecode

/Type/XObject/

Length 33555/

Height 413/

BitsPerComponent 8/

ColorSpace 17 0 R/

Subtype/Image/

Width 633

>>

stream (binary pixel data follows)

 

This post is part of our “Understanding the PDF File Format” series. In each article, we aim to take a specific PDF feature and explain it in simple terms. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

Ebook Page Link

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Related Posts:

  • No Related Posts
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>