This post is part of our “Understanding the PDF File Format” series. In each article, we aim to take a specific PDF feature and explain it in simple terms. If you wish to learn more about PDF, we have 17 years worth of PDF knowledge and tips, so click here to visit our series index!
At IDR Solutions I spend alot of time working on the Java PDF Library. I have spent alot of time working with data structures, in this case I have worked with Stacks. A Stack is an important form of data structure which is used a software development.
In this article, I will be talking about what stacks are, how they work and how PDF files use them. I will also talk about the advantages and disadvantages of stacks (both as part of the PDF specification and in implementing this in a Viewer).
What are stacks?
Stacks are in data structure which store data in a LIFO (Last In First Out) format. When you retrieve an item, you will get the last item you pushed to the stack. When an item is pulled it is removed from the stack, exposing the item underneath as the next free one. It is the opposite of a queue where the data has to ‘join the back’ of the queue.
Stacks has been used since the earliest days of programming (Assembler, Scientific calculators and Forth!) as a way to manipulate and store data in programming.
How stacks work inside PDF files?
The current graphic state within PDF files is saved using a stack mechanism. At any point, the PDF command stream can save the current settings to the stack using a q. It can then be retrieved with a Q, restoring the stack to its previous value.
Elements of a graphic state includes colorspace, textstate, clipping path, strokes, blends, line properties, and more. The elements control the visual side of the PDF file. All these can be put inside the stack to modify the graphic side of the PDF file.
In Java, you need to be careful to create a new copy of the data (deep copy), not just a new pointer to a shared object (shallow copy). Creating a shallow copy would mean that the graphic state would not be restored as there would only be one graphic state. All modifications would be made to the same graphic state.
Text state is one of the most important element in a graphic state. It contains many elements within itself such as font style, size, color, spacing, and more. These are all put into a stack for text state similar.
This makes it very easy to write programs in Postscript as the program does not need to track changes, only to save and then restore. It is common to see Q and q paired elements around sub-routines.
What are the advantages of using stack?
Using a stack makes it easy to write clean, simple code with less bugs. The developer does not get ‘leaks’ where some values are not correctly reset. You get shorter, more readable, less buggy code (which is always a good thing!).
If you want to read up on GraphicsState values, we recommend you dive into the PDF Reference guide.
What are the disadvantages of using stack?
Unfortunately, it also makes it harder for the developers who are implementing a PDF library. All the values need to be correctly serialized and de-serialized (which can include a clipping path containing huge numbers of points). The code below demonstrates how the implementation is difficult for developers.
We have also found, you need to spend some time replicating the exact way the PDF stack works. For example, if a sub-routine leaves a value on the stack, is that value globally visible outside the routine? And how should you handle unbalanced stack?
We will leave you to figure out those answers…
Stack stack = new Stack(); String za = "Zain"; stack.push(za); za = " IDR"; System.out.println("name = " + stack.pop());
run: name = Zain
Do you need to solve any of these problems in Java?
Convert PDF to HTML5
Convert PDF to SVG
View Forms in the browser
View PDF Documents
Convert PDF to image
Extract Text from PDF
Convert Image to PDF