Mark Stephens Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Interesting PDF bugs – How wrong can the references be?

1 min read

I have been looking at a customer PDF file today which highlights how ‘elastic’ the PDF file specification can be….

In theory, all objects are pointed to by a reference so if you got to the byte offset for object 100, you would see

100 0 obj

......

endobj

The data starts with the reference number and generation number. So far so good 🙂

I was looking at a file today which gave me this data for object 100

0 endobj

100 0 obj

end

In other words, the offset is set to 8 bytes too early in the stream, so you get the end of the previous object before the correct data for object 100. Most people would regard this as a PDF bug, but it opens in Acrobat (or course it would) which is very forgiving and does lots of error checking.

The real problem is not to correctly fix this file, but to fix the issue without adding code that does not slow down or break all the billions of PDF files out there which work correctly. It is also why you need a very large library of PDF files to regression test any code changes. I have just bought some new i7 PCs with SSD drives to help with our running our continuous testing our collection of PDF files!

After a morning of coding, I now have a code tweak which can handle this (and does not slow down our library in any way), but it is a good example of the issues which you can find in badly created PDF files.

This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

If you’re a first-time reader, or simply want to be notified when we post new articles and updates, you can keep up to date by social media (Twitter, Facebook and Google+) or the  Blog RSS.

Mark Stephens Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Resaving Acrobat Forms in the Browser

With the removal of Acrobat Reader support in Chrome and other browsers, we have seen a big increase in the number of clients asking...
Mark Stephens
1 min read

PDF XFA – Sending data as an Email

This simple article explains how to send XFA form data as an email in two possible ways. Why does the client need to send data as...
suda
1 min read

Editing XFA files in Adobe LifeCycle to allow direct…

At IDR Solutions I spend alot of time working on XFA for our Java PDF Library and PDF to HTML5 Converter and during this...
suda
2 min read

Leave a Reply

Your email address will not be published. Required fields are marked *