Why can’t I just open and edit a PDF file

People sometimes try to edit a PDF file by opening the file in a text editor. This very rarely works for 3 reasons.

Firstly, a PDF file is effectively a dump of PDF objects. The file contains a reference table giving the exact byte offset locations of each object from the start of the file, and the references tables. If you add or delete a character, or even resave it from an editor which converts line ending from one platform format to another, all these numbers will be incorrect. You would need to update them all. To prove it, just try opening a PDF, type in a space, save it and then see what happens if you try to open it…

Secondly, if you open a PDF file, much of the data is stored inside binary streams, in which data has been encrypted or compressed. If you view a PDF you will see some text but lots of incomprehensive ‘garbage’. This is the binary data. You cannot edit it, but you can easily break it just by adding a character. 

Finally, much of the PDF data needs to be looked at in connection with other data in the file. Text only makes sense by looking at the encoding on the font object, images have their data partly in XObjects and partly in ColorSpace objects, and so forth…

Some files formats such as HTML, Javascript and most source code can be easily manipulated in a text editor. The PDF file format is not one of these and is best accessed using a library which takes away all this complexity. Fortunately there are lots of both free and commercial tools available for all the most popular languages. If you have a favourite, why not post a recommendation here?

This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

One thought on “Why can’t I just open and edit a PDF file

  1. […] lot of people ask why the PDF file format is so popular? After all, it can be very hard to edit it.  Well, I was going to write an entry to explain but Duff Johnson has already published a really […]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>