I had a really intriguing PDF issue today. I was sent a PDF which did not open in JPedal but opens in Acrobat. It was allegedly created in Quartz. So I sat down to debug it….
Here is the start uncompressed xref table from a PDF file
So this xref table from the broken PDF contains 47 objects, and the first object (0) is ignored. Then object 1 starts at offset 89923 and it does not open. So I had a look at offset 89923 and it contains object zero!
0 0 obj <</Filter /FlateDecode /Length 1619>>stream
It is not object 1, it is object 0… Altering the offsets to start with 0 and not 1 fixed the file and it now opens. So we have to validate the first value and not trust the xref table to be correct. Be warned…
This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!
Are you a Developer working with PDF files?
Our developers guide contains a large number of technical posts to help you understand the PDF file Format.
Find out more about our software for Developers
|
|
|
> So we have to validate the first value and not trust the xref table to be correct. Be warned…
That’s the wrong consequence. The PDF in question simply was broken. In a regular attempt to open a PDF the xref table must be trusted.
The adequate consequence would be to signal to the UI user / API caller that the PDF is broken and offering the option to repair the file. If a repair is requested, then one can consider trying hacks like re-interpreting the xref table as described in the article.
Accept, that broken PDFs are actually very common and we are expected to just deal with them.