What makes writing a PDF parser especially interesting (ie complex) is that the specification is often ambiguous and that PDF is a very complex structure. To Display a PDF file requires the parser to correctly scan the PDF object data structure, to correctly decode and assemble all the data, and then parse the stream of Postscript commands. There could be issues at any level.
Occasionally we have to tweak our parser to allow for bugs in our code, things we had not considered, areas where the PDF does something which is permissible but not clear from the spec or even cases where the PDF does not actually follow the specification. Most PDF creation tool writers create a PDF according to their interpretation of the PDF specification and if it opens in Acrobat, they leave it at that. If it does not open in our parser, it is obviously our fault, not theirs.
Over time we have become very adept at tweaking our code to allow for all the little idiosyncracies of various PDF tools – we have lots of interesting internal flags in our source code and Intellij IDEA(my preferred Java IDE) excellent tracing allows us to follow the flow through code we know very well. It is normally a quick fix and regression test.
Sometimes, people send screenshots or say the file does not open. Unfortunately, it is very hard to help in this case. Send us the file and we can quickly find the issue. Screenshots are generally like giving a car mechanic a picture of your car and asking what is wrong – let him open up the bonnet and hear the engine and you’ll get a quick answer.
IDRsolutions develop a Java PDF library, a PDF forms to HTML5 converter, a PDF to HTML5 or SVG converter and a Java Image Library that doubles as an ImageIO replacement. On the blog our team post about anything interesting they learn about.