Mark Stephens Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Should ‘broken’ PDF files fail in Acrobat?

1 min read

One of the big issues in the PDF world is the number of ‘dodgy’ PDF files which work despite being poorly constructed and not meeting the PDF file specification. I highlighted one case last week. Acrobat has a large number of undocumented repair functions (so the PDF file creators often do not even know that their files are not valid). Ronan Hannah wrote an excellent article on PlanetPDF about Acrobat and its repair capabilities.

Badly made PDF files are an issue for everyone. It makes it much harder to write tools for working with PDF files (and it makes the code slower and more cumbersome if you have to allow for lots of ‘edge cases’). There is also no guarantee that the PDF files will work in the future as they rely on undocumented support which could be removed.

So what is the solution? There is no free PDF validation tool available and suggestions on Twitter to write one. This would definitely help. What I would like to see most, however is the option to disable Acrobat’s repair capabilities – or at least get it to pop up a Window saying ‘this PDF fails the PDF specification. It needs repairing to be displayed.’ Once this starts popping up, PDF creators would feel under far more pressure to stick closely to the PDF file specification. It would be a start and we could start educating users not only about invalid PDFs but also how to use the format to get the best effect. Lots of PDF files are valid but not very useful (ie just an image screenshot of a page inside a PDF wrapper). What do you think? 

This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

IDRsolutions develop a Java PDF Viewer and SDK, an Adobe forms to HTML5 forms converter, a PDF to HTML5 converter and a Java ImageIO replacement. On the blog our team post anything interesting they learn about.

Mark Stephens Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

4 Replies to “Should ‘broken’ PDF files fail in Acrobat?”

  1. Yeah, it’s a bit like html that way. I’ve been creating invalid PDF files and seeing how various common PDF viewers renders them. The result generally differs on all of them, with Adobe X usually doing the best job (when it isn’t crashing).

    Best approach might be to introduce something like an opt-in “PDF strict”, as there’s too many faulty PDF creators that will never be fixed.

  2. Hello,

    Do you have software recommendation for repairing damaged pdf files ?

    A friend just asked me to try and save his pdf file which now fails to open in both adobe and power pdf advanced.

    I tried just uncompressing the file with pdftk server but I get a generic cannot find file error. (though it takes a lot more time to file than if I just give it an incorrect path)

    Error: Unable to find file.
    Error: Failed to open PDF file:
    myfile.pdf

    Opening the file, I can see some data, for instance the first three lines are

    %PDF-1.6
    %¦éÏÄ
    455 0 obj
    <>
    endobj

    468 0 obj
    <</Size 491/Filter/FlateDecode/Type/XRef/Index[455 36]/W[1 3 1]/Prev 4963879/Length 109/Root 456 0 R/Info 454 0 R/ID[]>>stream

    1. Hi Jean-Francois Perreault
      Acrobat includes lots of logic to try and fix broken files.
      So we recommend opening the file Acrobat and re saving it to solve the issue.
      If Acrobat can’t open it, its probably beyond repair.

Leave a Reply

Your email address will not be published. Required fields are marked *