One of the big issues in the PDF world is the number of ‘dodgy’ PDF files which work despite being poorly constructed and not meeting the PDF file specification. I highlighted one case last week. Acrobat has a large number of undocumented repair functions (so the PDF file creators often do not even know that their files are not valid). Ronan Hannah wrote an excellent article on PlanetPDF about Acrobat and its repair capabilities.
Badly made PDF files are an issue for everyone. It makes it much harder to write tools for working with PDF files (and it makes the code slower and more cumbersome if you have to allow for lots of ‘edge cases’). There is also no guarantee that the PDF files will work in the future as they rely on undocumented support which could be removed.
So what is the solution? There is no free PDF validation tool available and suggestions on Twitter to write one. This would definitely help. What I would like to see most, however is the option to disable Acrobat’s repair capabilities – or at least get it to pop up a Window saying ‘this PDF fails the PDF specification. It needs repairing to be displayed.’ Once this starts popping up, PDF creators would feel under far more pressure to stick closely to the PDF file specification. It would be a start and we could start educating users not only about invalid PDFs but also how to use the format to get the best effect. Lots of PDF files are valid but not very useful (ie just an image screenshot of a page inside a PDF wrapper). What do you think?
This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!
IDRsolutions develop a Java PDF library, a PDF forms to HTML5 converter, a PDF to HTML5 or SVG converter and a Java Image Library that doubles as an ImageIO replacement. On the blog our team post about anything interesting they learn about.