Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Understanding the PDF File format – images

1 min read

Images are not stored inside a PDF file as Tiff or PNG or JPG images. They are stored as the binary pixel data along with the Colorspace used by that data. This allows a lot of flexibilty. For example, a CMYK image can be stored as a block of binary data (4 bytes for each pixel) and a specified as using a CMYKColorspace. The actual image data can be compressed in different ways to best suit the data (DCT for colour images, CCITT or JBIG2 for black and white 1 bit images). The image is scaled to fit the slot of the page so it can often be of a higher resolution.

There are 2 image commands for drawing images (ID and DO). The ID command allows the binary image data to be embedded in the command stream. This is not as flexible as the DO command which stores the image in a separate PDF object of type XObject or XForm. So the DO command tells to be far more common. It allows better data compression, offers more functionality and you can edit the image object without having to alter the command stream.

Each image has a name (like Im4). In the stream, you would see the command

/Im4

DO

which draws the image at this point with the current graphics Matrix.

The actual image IM4 is defined in a separate object which is listed in the Resources table. In this case it is Object 20 0 R.

XObject<</Im4 20 0 R/Im3 21 0 R>>

Object 20 contains the information on the image and the compressed binary pixel data

20 0 obj <<

/Filter/DCTDecode

/Type/XObject/

Length 33555/

Height 413/

BitsPerComponent 8/

ColorSpace 17 0 R/

Subtype/Image/

Width 633

>>

stream (binary pixel data follows)

 

This post is part of our “Understanding the PDF File Format” series. In each article, we aim to take a specific PDF feature and explain it in simple terms. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

3 reasons Java developers switch to JDeli from ImageIO

ImageIO is build into the JDK and provides basic image support in Java. JDeli is a commercial image library for Java Developers from IDRsolutions....
Mark Stephens
1 min read

Why we wrote our own Java jpeg2000 libraries

JPEG2000 is an important image file format which offers significant benefits over JPEG. For our specific usage it does generate significantly smaller file sizes...
Mark Stephens
1 min read

How to choose JPG versus JPEG2000 for image files

Since we started to support both JPG and JPG2000 as image file outputs in our software, we have found that this is a very...
Mark Stephens
1 min read

7 Replies to “Understanding the PDF File format – images”

  1. Can “/Resources” point to an array of indirect object references that are composed of dictionaries? For example, if I have Font dictionary indirectly references as 20 0 R, and I have another dictionary (“<< /XObject <> >>”) referenced indirectly as 21 0 R, can my resources like like this: “/Resources [ 20 0 R 21 0 R ]”?

  2. Sorry, my previous comment was sloppy and full of typos:

    Can “/Resources” point to an array of indirect object references that are composed of dictionaries? For example, if I have a Font dictionary indirectly referenced as 20 0 R, and I have another dictionary (“<< /XObject <> >>”) referenced indirectly as 21 0 R, can my resources look like this: “/Resources [ 20 0 R 21 0 R ]”?

    1. Thanks for the feedback, Mark!

      I have another question due to the fact that I can’t get an image to actually show in my file. If I use a Java File object to open a .jpeg file and pass that object as a parameter to a new FileInputStream object, which in turn is passed as a parameter to a new BufferedInputStream object, can I use all the bytes from the BufferedInputStream’s internal byte buffer as the stream for an XObject? That’s what I’m currently doing, but Adobe Reader won’t render the image. I saved the same image as a pdf using Word, opened it up in a simple text editor, and noticed that the image stream data was half the length as mine. The length of my image stream, by the way, is the same number of bytes as the file itself.

  3. You cannot directly embed an image in an XObject. You need to provide the raw data (with an encoding), ColorSpace and other details. When you save a Doc as PDF, this is being done for you automatically.

    It is a lot simpler to add an image with a library like IText.

    You might find these 2 posts helpful to understand XObjects

    https://blog.idrsolutions.com/2010/04/understanding-the-pdf-file-format-how-are-images-stored/
    https://blog.idrsolutions.com/2010/09/understanding-the-pdf-file-format-images/

  4. Since both PNG and PDF formats use DEFLATE algorithms I’m wondering are these algorithms are somewhat compatible. I.e. is it possible to select PNG compression parameters which will later allow to directly copy binary data while making PDF file? Of course there is much more to do to create valid PDF file, but I would like to avoid decompressing and compressing of image data.

    1. Your problem with all these strategies is that there is no header on the data and you need to factor in the ColorSpace. You do not want to be creating CMYK PNGs for example.

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2019. All rights reserved.