Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Viewing PDF objects

58 sec read

Most of the time, you can just use a PDF file without thinking about what lies ‘under the bonnet’. But sometimes you want to find out about the actual objects inside a PDF file.

I need to do this quite often to debug JPedal (Java PDF viewer  and PDF to Image converter) and BuildVu (PDF to HTML5/SVG converter). It is very useful if you want to know about the colours used and how the PDF might print or whether it has any useful text. It also allows you to see how big the images inside the PDF really are and whether they can be easily extracted.

I used to open the PDF in a text editor but this is not an ideal solution. Not only can it be quite hard to decipher, but if the PDF is encrypted or contains compressed data and objects you cannot view these.

So I was really pleased to find a little feature hidden inside the Advanced menu option of Acrobat 9.0 to ‘Browse Internal PDF Structure

This allows you to see the actual PDF objects much more clearly. You will still need your trusty and well-thumbed copy of the Adobe Acrobat PDF specification but it has saved me a lot of time when needing to see what is happening inside a PDF file.

This post is part of our “Understanding the PDF File Format” series. In each article, we aim to take a specific PDF feature and explain it in simple terms. If you wish to learn more about PDF, we have 21 years worth of PDF knowledge and tips, so click here to visit our series index!

IDRsolutions develop a Java PDF Viewer and SDK, an Adobe forms to HTML5 forms converter, a PDF to HTML5 converter and a Java ImageIO replacement. On the blog our team post anything interesting they learn about.

Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Enabling SVG Gzip Compression on Apache and NGINX

Gzip compression is a widely supported method of reducing the size of the content sent from a web server in order to improve the...
Leon Atherton
47 sec read

8 Replies to “Viewing PDF objects”

  1. Thanks, this was very helpful! In Acrobat XI, this is hidden in the Edit … Preflight … Options … menu.

  2. In Adobe Acrobat DC (2015 Release (classic) | Version 2015.006.30279, running under Windows 7 Enterprise) this useful feature is hidden under Tools > PDF Standards > Preflight.
    What is less-than-obvious (at least to me) is where the Mac context menu in the first illustration in this article came from. It pops when you click the Options pull-down. The second illustration comes from clicking Browse Internal PDF Structure and drilling down several hierarchy levels by opening flipper triangles.
    My experience is that the window shown in the second illustration defaults opening way too narrow, causing many columns to be cut off. Also, expanding image streams results in very slow scrolling – that’s a lot of hex to look at. (A page image picked out at random from a 450 pp. scanned engineering document is 134k lines of 16 bytes each, and there’s no apparent copy facility. Never mind that the output format – two columns of address offset in both decimal and hex, the stream in hex [16 columns of 8-bit bytes separated by spaces], and then the stream in ASCII – sure wasn’t intended to be copied or machine-parsed.)

  3. On my version I use

    Advanced – Preflight – Options – Browse Internal PDF structure. There is also a option to browse the font data there.

    Like you I find the format is not great.

    The data is also not always strictly true. For example, you find there is an AP object if the PDF does not really contain one – it generates the dynamic content which Adobe would use

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2019. All rights reserved.