There is more than one PDF file specification

One of the really annoying thing about working with PDF is that there is actually more than one PDF file specification.

First of all, there is the very long Adobe PDF Reference Produced by Adobe. This freely available document is very long, detailed and often rather cryptic. Sometimes you need to hunt through alsorts of minor appendices to find out a specific case or you can find information is missing.

Today I was debugging some code to read the Hint stream in a Linearized Object – this allows a PDF viewer to display pages before the document has finished loading. This object consists of a set of variable bit length values split into sub-tables. It seems that while the values are packed together into a bitstream, the sub-tables themselves must be byte aligned. I could not find any mention of this is the spec – it was an intuitive guess on my part to try it.

The second PDF File Specification is what works in Acrobat. We had an interesting case last week of a file which did not work in our JPedal reader but opened in Acrobat. When we hunted it down, the reason turned out to be that the Specification says all PDF files end with the characters %%EOF – this file ended %%EO but still worked. So it seems, that in this case the real specification is a guideline and not a rule.

So, if you are working with PDF files, always keep a copy of the specification handy – the printed copy also double up as an excellent doorstop or monitor stand. But also make sure you have a copy of Acrobat handy and be prepared to find rules can be interpreted rather loosely.

This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>