Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Learning about PDF File Format

2 min read

pdf icon

The PDF file format is very useful and well-documented, but it is also quite complicated and it does not work how most people imagine. It is structured very differently from a Word or Excel document.

Most of the time, this is not an issue – you can just use PDF files without knowing anything about them and just enjoy the benefits. There comes a time though, when you may need to start to dabble. So this article is designed to give you some starting points.

What is a PDF file?

It is worth getting to grips first with the basic idea that a PDF file is essentially a set of linked objects (so each page has a page object, which may include font objects defining the fonts, XObjects storing image data and so on). Then you can look at all the different types of objects. The PDF file contains all these objects and their locations (the references) so that they can be read as needed. It only makes sense when it is decoded by a parser and all the elements are assembled together for the final output.

Who is incharge of the PDF file format?

The PDF Association is the overall governing body (and well worth joining if you work with PDF files). Adobe is an important member, but organisation contains lots of other PDF vendors (both large and small). It also organises conferences and provides lots of resources online.

Is the PDF file format Open?

The PDF File format was originally produced by Adobe but it is now an Open Specification (ISO-32000) and anyone can join the committees defining new features and versions.

How do I learn more about the format?

The definitive guide to the PDF Format is the PDF reference manual. this is a very complete and comprehensive(and equally dull) volume which explains most of the internal working of the PDF file format. It is not designed to tell you about how to create or modify the PDF file – just to provide all the details. You will not find it an easy read, but the first 2 chapters do provide an excellent introduction to the PDF file format.

A slightly less technical introduction to the internals of a PDF file can be found at wikipedia. This also gives you a detailed inside into the structure of the file.

First steps?

Once you have started to explore the internal guts of the PDF file format you can open up a few PDF files. It is not recommended that you directly edit this file (even adding a space can break it), but you can open it in a Text editor and view it. Much of the data is encrypted or compressed so a more useful tool is RUPS. I explained how you can use this to examine the internals of a PDF file in another article.

How do I work with PDF files directly?

To really do much with the PDF file you will need a third party library to manipulate the PDF files. We always recommend using libraries and tools (there are lots of commercial and Open Source ones) to work with PDF files. If you want to see how complex it is to edit PDF files manually, have a look at our series on How to make your own PDF file

Do you have any other recommended articles?

You may also find our series on Understanding the PDF file Format useful, especially the related post 10 things new PDF Developers need to know.

So if you have reached the point where you want to start to explore the PDF file format, I hope this has provided some useful starting points and please do post your own experiences or recommendations.



Are you a Developer working with PDF files?

Our developers guide contains a large number of technical posts to help you understand the PDF file Format.

Do you need to solve any of these problems?

Display PDF documents in a Web app
Use PDF Forms in a web browser
Convert PDF Documents to an image
Work with PDF Documents in Java
Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2022. All rights reserved.