Why we need to see your PDF files…

1 min read

What makes writing a PDF parser especially interesting (ie complex) is that the specification is often ambiguous and that PDF is a very complex structure. To Display a PDF file requires the parser to correctly scan the PDF object data structure, to correctly decode and assemble all the data, and then parse the stream of Postscript commands. There could be issues at any level. 

Occasionally we have to tweak our parser to allow for bugs in our code, things we had not considered, areas where the PDF does something which is permissible but not clear from the spec or even cases where the PDF does not actually follow the specification. Most PDF creation tool writers create a PDF according to their interpretation of the PDF specification and if it opens in Acrobat, they leave it at that. If it does not open in our parser, it is obviously our fault, not theirs.

Over time we have become very adept at tweaking our code to allow for all the little idiosyncracies of various PDF tools – we have lots of interesting internal flags in our source code and Intellij IDEA(my preferred Java IDE) excellent tracing allows us to follow the flow through code we know very well. It is normally a quick fix and regression test.

Sometimes, people send screenshots or say the file does not open. Unfortunately, it is very hard to help in this case. Send us the file and we can quickly find the issue. Screenshots are generally like giving a car mechanic a picture of your car and asking what is wrong – let him open up the bonnet and hear the engine and you’ll get a quick answer. 

Are you a Developer working with PDF files?

Our developers guide contains a large number of technical posts to help you understand the PDF file Format.

Find out more about our software for Developers

Convert PDF to HTML5 or SVG Convert PDF to HTML5 or SVG
Convert AcroForms and XFA to HTML5Convert AcroForms and XFA to HTML5
Java PDF SDK for working with PDF files Java PDF SDK for working with PDF files

One Reply to “Why we need to see your PDF files…”

  1. Even though our customers often have confidential PDFs, we have still been able to send effective testcases. If you have the full version of Acrobat, there is a Redaction tool that can remove all but the offending page, and even most of the text on that page. If this stripped down censored PDF still shows the error when you open it in JPedal, it’s just as good as the original. Foxit PDF Editor can do the same job.

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2022. All rights reserved.