Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Intriguing PDF xref issue

45 sec read

I had a really intriguing PDF issue today. I was sent a PDF which did not open in JPedal but opens in Acrobat. It was allegedly created in Quartz. So I sat down to debug it….

Here is the start uncompressed xref table from a PDF file

xref
0 271
0000000000 65535 f
0000000015 00000 n
0000000102 00000 n
0000000178 00000 n
It contains 271 objects, and the first object (0) is ignored. Then object 1 starts at offset 15 and so on.

xref
0 47
0000000000 65535 f
0000089923 00000 n
0000089809 00000 n
0000088105 00000 n
0000087885 00000 n

So this xref table from the broken PDF contains 47 objects, and the first object (0) is ignored. Then object 1 starts at offset 89923 and it does not open. So I had a look at offset 89923 and it contains object zero!

0 0 obj <</Filter /FlateDecode /Length 1619>>stream

It is not object 1, it is object 0… Altering the offsets to start with 0 and not 1 fixed the file and it now opens. So we have to validate the first value and not trust the xref table to be correct. Be warned…

This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

Did you know...

IDRsolutions offers a whole range of online file converters to convert PDF and Microsoft Excel, Word and Office Documents to HTML5, SVG or image formats?

It is free to use for single file conversions and also includes Developer links if you want to use our commercial software for bulk conversions. Find out more on this page

Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Enabling SVG Gzip Compression on Apache and NGINX

Gzip compression is a widely supported method of reducing the size of the content sent from a web server in order to improve the...
Leon Atherton
47 sec read

2 Replies to “Intriguing PDF xref issue”

  1. > So we have to validate the first value and not trust the xref table to be correct. Be warned…

    That’s the wrong consequence. The PDF in question simply was broken. In a regular attempt to open a PDF the xref table must be trusted.

    The adequate consequence would be to signal to the UI user / API caller that the PDF is broken and offering the option to repair the file. If a repair is requested, then one can consider trying hacks like re-interpreting the xref table as described in the article.

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2020. All rights reserved.