I continue to be impressed at the sheer variations we come across with PDF files, even after 10 years of writing a PDF parser….
Today we had a PDF file created with a tool call PdfGenLib. Here is a sample of what it produces…
The interesting thing is that it appears to be hard-coded to generate a 6 digit number for each PDF object reference (ie 000001 0 R) rather than 1 0 R. The PDF reference does not say that you cannot do this, but it does not add anything to the PDF file except to make it larger. So no other tools do this.
And it means that some clever code I wrote last week to allow for the first object in the PDF being object 0 0 R (and not object 1 0 R) needed to be modified. Still it means I will never be short of work 😉
Do you have any similar experiences with ‘strange’ PDF files?
This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!
IDRsolutions develop a Java PDF library, a PDF forms to HTML5 converter, a PDF to HTML5 or SVG converter and a Java Image Library that doubles as an ImageIO replacement. On the blog our team post about anything interesting they learn about.