Site iconJava PDF Blog

Improvements to Our PDF Inspector

In the last two(1)(2) releases of JPedal, we added some exciting new features to our PDF Inspector tool which can help you debug those problematic PDF files!

Cross-Reference Stream Viewer

Cross-reference streams (AKA XRef streams) were introduced in PDF 1.5 as a way to more compactly store object offsets in the file. While this greatly reduces file size, cross-reference streams are difficult to debug by us humans since the compressed data is meaningless to us.

If you are familiar with cross-reference streams, you might know that there are different types of entries in the stream. While a traditional cross-reference table has ‘in use’ and ‘free’ object entries, a cross-reference stream contains a third type for objects within compressed object streams, namely ‘objstm’.

Furthermore, the size of each entry in a cross-reference stream is different for each PDF, due to the /W array. Therefore, it would be nice if there was a tool that could display the different types of entries in a cross-reference table according to the widths defined in the file. Thankfully, the JPedal Inspector does just that!

In this example file, we can see that the cross-reference stream has a /W array of [1 2 1], and the widths of the columns on the left and sized accordingly. This makes it easy for us to read. For example, the first object in the stream is type 1 which means it is “in use but is not compressed”, it resides at offset 0010 in the file, and it has a generation of zero.

Clicking on an object in the list allows you to see it in COS syntax. If it is a type 2 compressed entry, then JPedal automatically decompresses it and displays it.

Search for Commands

Everything you see on the page in a PDF comes from a content stream (well not everything, but most!), which is just a sequence of drawing commands. This includes images, shapes, and text.

When debugging a PDF file you might want to search for text on the page to see how it is drawn by the content stream. This is now possible in the latest version of the JPedal Inspector! Just click the search icon in the command window and type in what you are looking for.

If you are familiar with PDF text drawing commands, you might know about the TJ operator, which allows for individual glyph positioning, for example:

[(H) 50 (e) -20 (l) 0 (l) 0 (o) 150 ( ) 0 (W) -30 (o) -10 (r) 20 (l) 0 (d)] TJ

If you are trying to find the word “Hello” with a basic search tool, it will not work, but the JPedal Inspector will be able to find this because it is able to parse the text commands correctly and store “Hello World” internally.

Learn more about our PDF Inspector and what other debugging features it has to offer.

We can help you better understand the PDF format as developers who have been working with the format for more than 2 decades!