I had an intriguing PDF bug to hunt down today. Any PDF file defines a pointer at the end of the file which points to the data structures inside the PDF. This is the startxref value in the last 1000 bytes of the file. Here is an example.
0000113537 00000 n trailer << /Size 74 /Root 33 0 R /Info 1 0 R /ID [ <0e208555874758e6f5945d00d352d90e> <0e208555874758e6f5945d00d352d90e> ] >> startxref 113735 %%EOF
Now here is the file I have been looking at
0000221725 00000 n trailer << /Size 4 /Root 1 0 R >> startref 66913784 %%EOF
Note the subtle difference? This file opens in Acrobat though so while the spec specifies one value, alternatives are allowed. So much for clear standards!
This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!
Can we help you to solve any of these problems?
IDRsolutions has been helping companies to solve these problems since 1999.
The standard is still right. It’s just that there are so many crappy PDF creators out there that Acrobat (and Reader) over the years have implemented ways to deal with well known errors in PDF files. That does not make them correct, but it gives the user a way to open the file.
Some of these problems were problems with Adobe’s own PDF creation software (remember the Y2K bug in Distiller that reported a year of 19100 instead of 2000?), but most of them are bugs in 3rd party PDF creators. That’s why it’s so hard to write a PDF processor that deals with the PDF files that open in Acrobat and Reader – as you just found out again 🙂
I have a few ideas about what to do with people who implement PDF creators without fully understanding the PDF spec, but I’ll refrain from disclosing those ideas here 🙂
Thanks for sharing your experience.
I think you will find there is a queue waiting to deal with those PDF creators!
It is irritating because you keep having to add code to deal with odd corner cases which should just fail.