Find out what’s really inside your PDF files

If you’ve ever opened up PDF in a text editor you will know how difficult it can be to discern what is going on inside, especially if the file contains a lot of objects and streams.

There are a few tools out there that there for the budding PDF developer to look at the internal structure of a PDF, one of the major ones is of course Adobe Acrobat however if there are some other great free and commercial tools for looking at the internal structure of PDFs.

Previously (in 2010!) Mark mentioned PDF CanOpener as a useful tool, although it has a price tag that may be considered hefty ($195) for those just curious about PDFs.

The tool I have recently found and, find rather useful, is called PDFXplorer by O2 Solutions. It is a small, Windows only, freeware application that allows you to explore the internal structure of a PDF as it is laid out in a tree. Turning this:

What you see if you open a PDF up in a text editor

To this:

PDFXplorer’s display

It lists each of the objects attributes in a neatly laid out table, has a good navigation tab that lets you easily move about the PDF, and also allows you to view and save streams and text data within the PDF file.

This can be very useful for example it comes in very handy when you want to know what embedded JavaScript is present within the PDF and what object it is associated with. Or to extract certain kinds of images from the PDF.

Overall you should consider making use of a tool like this if you develop with the PDF format or want to view the internals of PDF files. I am still on the look out for even more useful PDF tools, especially those that are on different platforms, do you know of any that are useful?

This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

The following two tabs change content below.
Lyndon is a Developer at IDR Solutions. He currently focuses mostly on the JavaScript in the Viewer and PDF to HTML5 Converter and also the Android PDF Viewer. He gave a short talk at the GlassFish UnConference before JavaOne 2012. Outside of IDR Solutions he has a keen interest in AI and Games Programming and runs a blog that he periodically updates.

Related Posts:

lyndon

About Lyndon Armitage

Lyndon is a Developer at IDR Solutions. He currently focuses mostly on the JavaScript in the Viewer and PDF to HTML5 Converter and also the Android PDF Viewer. He gave a short talk at the GlassFish UnConference before JavaOne 2012. Outside of IDR Solutions he has a keen interest in AI and Games Programming and runs a blog that he periodically updates.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>