As I have said many times before, one of the ‘issues’ with the PDF spec is that some files can have a huge number of errors and still open. Adobe Acrobat has a large number of built-in fixing tools, so often the best solution is to open and recieve the PDF file. I have been looking at a good example today….
In theory the first line of a PDF file should be the %PDF identifier. Here is what the PDF Spec says
However, this is what I found in this PDF file.
Some random data has been appended to the file. This is a problem because the PDF file contains a large number of tables which use offsets from the start of the file (assuming that to be %PDF). How to handle these sorts of cases is not formally defined and different tools will handle it in different ways – we do not currently allow for it for example. It really depends on what sort of ‘rubbish’ files the developers of a library have met.
Generally the best solution with these files is to open and resave in Adobe Acrobat. This has some very powerful tools to fix and repair PDF files. Interestingly, the PDF I have been looking at drops from a size of 318K to 278k and now works in all PDF tools.
This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!
Are you a Developer working with PDF files?
Our developers guide contains a large number of technical posts to help you understand the PDF file Format.
Find out more about our software for Developers
|Convert PDF to HTML5 or SVG|
|Convert AcroForms and XFA to HTML5|
|Java PDF SDK for working with PDF files|