Site iconJava PDF Blog

What are PDF Object streams?

PDFObject streams are a very useful feature added to the PDF specification from version 1.6 which introduces a new type of PDF object. Until they arrived, PDF objects consisted of a binary part (which could be compressed) and a text header (which was not). If you open a PDF file in a text editor, you would see something like this.

TEXT containing information about the PDF object

15 0 objColorSpace/DeviceRGB/BitsPerComponent 8/Interpolate false/SMask 16 0 R/Filter/FlateDecode/Length 3609>>

binary data

stream

This is still allowed and you can continue to do this – the PDF file format has always been very good on backward compatibility. But what object Streams allow you to do is to put lots of PDF objects together inside a single binary stream. The binary stream still has a text header, telling the PDF parser how to find and extract the PDF objects, but all the PDF objects themselves can be compressed. This makes the PDF smaller, potentially more secure and possibly faster to load.

The only minor downside is that developers like me can no longer open the PDF files in a text editor and find the objects – they are now hidden away inside compressed binary data! But there are lots of tools to allow us to see inside the PDF, some of which we have highlighted in this blog post on how to view PDF objects.

This feature was introduced in PDF version 1.5, so you will need a PDF library which can support this later format. It is the major reason many PDF files do not open in Sun’s old Java PDF-Renderer, for example. But it is supported by most mainstream PDF tools, including ours.