How can a PDF file be broken?

People often ask how you can have a broken PDF file? I have looked at a file which gives a nice example so here is an explanation.

When you load a PDF file, it has lots of pointers. If the file is linearized, it will have a hints table (/H) telling you the location of the Linearization data in the file. Here is an example from the start of this PDF file.

%PDF-1.5%
915 0 obj<<
/Linearized 1
/H [5844 584]
/O 954
/E 61736
/N 7
/L 180360
/T 161941>>
endobj

So far so good, we go to offset 5844 (which is 16D4 in hex) and read the object which starts there. Except, the H object does not start there – it starts at 1671 hex. Here is it viewed in a hex editor.

hex dump

hex dump of PDF file

The value set puts us in the middle of a random binary data stream (which is going to be very confusing for any PDF parser). So in this case we abandon any attempt to open the PDF in linearized mode. If you open it in Adobe it will tell you it is broken and try to fix it.

Wrong links in PDF files and a major cause of PDF files being ‘broken’ because they point to random data when something is expected.

This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>