Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Should ‘broken’ PDF files fail in Acrobat?

One of the big issues in the PDF world is the number of ‘dodgy’ PDF files which work despite being poorly constructed and not meeting the PDF file specification. I highlighted one case last week. Acrobat has a large number of undocumented repair functions (so the PDF file creators often do not even know that their files are not valid). Ronan Hannah wrote an excellent article on PlanetPDF about Acrobat and its repair capabilities.

Badly made PDF files are an issue for everyone. It makes it much harder to write tools for working with PDF files (and it makes the code slower and more cumbersome if you have to allow for lots of ‘edge cases’). There is also no guarantee that the PDF files will work in the future as they rely on undocumented support which could be removed.

So what is the solution? There is no free PDF validation tool available and suggestions on Twitter to write one. This would definitely help. What I would like to see most, however is the option to disable Acrobat’s repair capabilities – or at least get it to pop up a Window saying ‘this PDF fails the PDF specification. It needs repairing to be displayed.’ Once this starts popping up, PDF creators would feel under far more pressure to stick closely to the PDF file specification. It would be a start and we could start educating users not only about invalid PDFs but also how to use the format to get the best effect. Lots of PDF files are valid but not very useful (ie just an image screenshot of a page inside a PDF wrapper). What do you think? 

4 Replies to “Should ‘broken’ PDF files fail in Acrobat?”

  1. Yeah, it’s a bit like html that way. I’ve been creating invalid PDF files and seeing how various common PDF viewers renders them. The result generally differs on all of them, with Adobe X usually doing the best job (when it isn’t crashing).

    Best approach might be to introduce something like an opt-in “PDF strict”, as there’s too many faulty PDF creators that will never be fixed.

  2. Hello,

    Do you have software recommendation for repairing damaged pdf files ?

    A friend just asked me to try and save his pdf file which now fails to open in both adobe and power pdf advanced.

    I tried just uncompressing the file with pdftk server but I get a generic cannot find file error. (though it takes a lot more time to file than if I just give it an incorrect path)

    Error: Unable to find file.
    Error: Failed to open PDF file:

    Opening the file, I can see some data, for instance the first three lines are

    455 0 obj

    468 0 obj
    <</Size 491/Filter/FlateDecode/Type/XRef/Index[455 36]/W[1 3 1]/Prev 4963879/Length 109/Root 456 0 R/Info 454 0 R/ID[]>>stream

    1. Hi Jean-Francois Perreault
      Acrobat includes lots of logic to try and fix broken files.
      So we recommend opening the file Acrobat and re saving it to solve the issue.
      If Acrobat can’t open it, its probably beyond repair.

