Understanding the PDF file format – PDF Object streams

PDFObject streams are a very useful feature added to the PDF specification which introduces a new type of PDF object. Until they arrived, PDF objects consisted of a binary part (which could be compressed) and a text header (which was not). If you open a PDF file in a text editor, you would see something like this.

TEXT containing information about the PDF object

15 0 obj<</Type/XObject/Subtype/Image/Width 153/Height 66/ColorSpace/DeviceRGB/BitsPerComponent 8/Interpolate false/SMask 16 0 R/Filter/FlateDecode/Length 3609>>

binary data

stream

This is still allowed and you can continue to do this – the PDF file format has always been very good on backward compatibility. But what object Streams allow you to do is to put lots of PDF objects together inside a single binary stream. The binary stream still has a text header, telling the PDF parser how to find and extract the PDF objects, but all the PDF objects themselves can be compressed. This makes the PDF smaller, potentially more secure and possibly faster to load.

The only minor downside is that developers like me can no longer open the PDF files in a text editor and find the objects – they are now hidden away inside compressed binary data! But there are lots of tools to allow us to see inside the PDF, some of which we have highlighted in Useful PDF tools – pdfedit.

This feature was introduced in PDF version 1.5, so you will need a tool which can support Object streams. It is the major reason many PDF files do not open in Sun’s PDF-Renderer, for example. But it is now supported by most mainstream PDF tools.

Do you have a favourite PDF feature or something you would like explained. Why not contact usand we will try to cover it?

This post is part of our “Understanding the PDF File Format” series. In each article, we aim to take a specific PDF feature and explain it in simple terms. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

Ebook Page Link

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Related Posts:

  • No Related Posts
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

One thought on “Understanding the PDF file format – PDF Object streams

  1. [...] This post was mentioned on Twitter by Andrew, JPedal Java PDF Blog. JPedal Java PDF Blog said: New Blog Post: Understanding the PDF file format – PDF Object streams: Object streams are a very useful feature ad… http://bit.ly/av8SNa [...]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>