Interesting PDF bugs – X marks the spot (or not)

I had an intriguing PDF bug to hunt down today. Any PDF file defines a pointer at the end of the file which points to the data structures inside the PDF. This is the startxref value in the last 1000 bytes of the file. Here is an example.

0000113537 00000 n
trailer
<< /Size 74 /Root 33 0 R /Info 1 0 R /ID [ <0e208555874758e6f5945d00d352d90e>
<0e208555874758e6f5945d00d352d90e> ] >>
startxref
113735
%%EOF

Now here is the file I have been looking at

0000221725 00000 n

trailer
 << /Size 4
    /Root 1 0 R
 >>
startref
66913784
%%EOF

Note the subtle difference? This file opens in Acrobat though so while the spec specifies one value, alternatives are allowed. So much for clear standards!

This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

2 thoughts on “Interesting PDF bugs – X marks the spot (or not)

  1. The standard is still right. It’s just that there are so many crappy PDF creators out there that Acrobat (and Reader) over the years have implemented ways to deal with well known errors in PDF files. That does not make them correct, but it gives the user a way to open the file.
    Some of these problems were problems with Adobe’s own PDF creation software (remember the Y2K bug in Distiller that reported a year of 19100 instead of 2000?), but most of them are bugs in 3rd party PDF creators. That’s why it’s so hard to write a PDF processor that deals with the PDF files that open in Acrobat and Reader – as you just found out again 🙂

    I have a few ideas about what to do with people who implement PDF creators without fully understanding the PDF spec, but I’ll refrain from disclosing those ideas here 🙂

    Thanks for sharing your experience.

  2. I think you will find there is a queue waiting to deal with those PDF creators!

    It is irritating because you keep having to add code to deal with odd corner cases which should just fail.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>