Small images can cause big problems in PDF files

I debugged a file this week which showed up some interesting features you can find inside a PDF file. The actual problem was that a question mark was being displayed on an image. It should not appear there. Drilling down it turned out that it was actually drawn but the user could not see it because a white box was then drawn over it.

So the first point is that the data in some PDF files can be rather messy. Because the user only sees the final rendered output it is quite common to remove ‘old’ items by just drawing over them rather than actually erasing them from the data stream. This can cause issues with extraction because you need to check whether objects were actually visible. It also makes the file bigger.

The reason we did not erase the question mark was because the PDF stream was using a trick which involved creating a 1×1 pixel white image and then scaling it up to hide the question mark. I had written some heuristics to trim out tiny pixels which had not allowed for this case (it does now). It would have been more efficient to draw a white box but the PDF world if full of such little ‘tricks’.  So you need to look not just at the PDF commands but how they are being used to achieve an overall effect.

So watch out for these little ‘gotchas’ and if you are using our PDF viewer, it is fixed in today’s release.

This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>