What makes writing a PDF parser especially interesting (ie complex) is that the specification is often ambiguous and that PDF is a very complex structure. To Display a PDF file requires the parser to correctly scan the PDF object data structure, to correctly decode and assemble all the data, and then parse the stream of Postscript commands. There could be issues at any level.
Occasionally we have to tweak our parser to allow for bugs in our code, things we had not considered, areas where the PDF does something which is permissible but not clear from the spec or even cases where the PDF does not actually follow the specification. Most PDF creation tool writers create a PDF according to their interpretation of the PDF specification and if it opens in Acrobat, they leave it at that. If it does not open in our parser, it is obviously our fault, not theirs.
Over time we have become very adept at tweaking our code to allow for all the little idiosyncracies of various PDF tools – we have lots of interesting internal flags in our source code and Intellij IDEA(my preferred Java IDE) excellent tracing allows us to follow the flow through code we know very well. It is normally a quick fix and regression test.
Sometimes, people send screenshots or say the file does not open. Unfortunately, it is very hard to help in this case. Send us the file and we can quickly find the issue. Screenshots are generally like giving a car mechanic a picture of your car and asking what is wrong – let him open up the bonnet and hear the engine and you’ll get a quick answer.
Are you a Developer working with PDF files?
Our developers guide contains a large number of technical posts to help you understand the PDF file Format.
Find out more about our software for Developers
|Convert PDF to HTML5 or SVG|
|Convert AcroForms and XFA to HTML5|
|Java PDF SDK for working with PDF files|