Should ‘broken’ PDF files fail in Acrobat?

One of the big issues in the PDF world is the number of ‘dodgy’ PDF files which work despite being poorly constructed and not meeting the PDF file specification. I highlighted one case last week. Acrobat has a large number of undocumented repair functions (so the PDF file creators often do not even know that their files are not valid). Ronan Hannah wrote an excellent article on PlanetPDF about Acrobat and its repair capabilities.

Badly made PDF files are an issue for everyone. It makes it much harder to write tools for working with PDF files (and it makes the code slower and more cumbersome if you have to allow for lots of ‘edge cases’). There is also no guarantee that the PDF files will work in the future as they rely on undocumented support which could be removed.

So what is the solution? There is no free PDF validation tool available and suggestions on Twitter to write one. This would definitely help. What I would like to see most, however is the option to disable Acrobat’s repair capabilities – or at least get it to pop up a Window saying ‘this PDF fails the PDF specification. It needs repairing to be displayed.’ Once this starts popping up, PDF creators would feel under far more pressure to stick closely to the PDF file specification. It would be a start and we could start educating users not only about invalid PDFs but also how to use the format to get the best effect. Lots of PDF files are valid but not very useful (ie just an image screenshot of a page inside a PDF wrapper). What do you think? 

This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

4 thoughts on “Should ‘broken’ PDF files fail in Acrobat?

  1. Anders Simonsen

    Yeah, it’s a bit like html that way. I’ve been creating invalid PDF files and seeing how various common PDF viewers renders them. The result generally differs on all of them, with Adobe X usually doing the best job (when it isn’t crashing).

    Best approach might be to introduce something like an opt-in “PDF strict”, as there’s too many faulty PDF creators that will never be fixed.

  2. Often you do not even know the file is invalid because Acrobat repairs it as it loads it. It would be nice to have a stricter spec.

  3. Jean-Francois Perreault

    Hello,

    Do you have software recommendation for repairing damaged pdf files ?

    A friend just asked me to try and save his pdf file which now fails to open in both adobe and power pdf advanced.

    I tried just uncompressing the file with pdftk server but I get a generic cannot find file error. (though it takes a lot more time to file than if I just give it an incorrect path)

    Error: Unable to find file.
    Error: Failed to open PDF file:
    myfile.pdf

    Opening the file, I can see some data, for instance the first three lines are

    %PDF-1.6
    %¦éÏÄ
    455 0 obj
    <>
    endobj

    468 0 obj
    <</Size 491/Filter/FlateDecode/Type/XRef/Index[455 36]/W[1 3 1]/Prev 4963879/Length 109/Root 456 0 R/Info 454 0 R/ID[]>>stream

    • Hi Jean-Francois Perreault
      Acrobat includes lots of logic to try and fix broken files.
      So we recommend opening the file Acrobat and re saving it to solve the issue.
      If Acrobat can’t open it, its probably beyond repair.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>