Problems editing PDF files

In a previous article, I wrote about the problems with editing the text in PDF files. PDF files are very different from other file formats such as Word or OpenOffice which stored the data as a set of objects which are then rendered as needed. The PDF file format were really designed for end file display.

A PDF file is more like a vector image file. It contains a set of pages which draw the page so it looks perfect – underneath there are very few structures so editing can be a nightmare. Essentially what you now have in a PDF is the draw commands in Postscript to show the content, not the content itself.

Some manipulations are very easy with a PDF. So splitting a PDF into separate pages or drawing on top of a PDF is very easy. PDF forms and Javascript content are also easy to alter as they have a clear structure. Its also straight-forward to change one image for another of the same size.

Where it becomes difficult if you want to change the actual content on the page. Because the structure of words, paragraphs and text flow no longer exists it is very difficult to alter the text, especially if you need to reflow it. You are having to have to hack the Postscript command stream and guess what is going on. PDF files which look identical can be structured very differently internally.

The PDF file format is great for displaying content, securing it, allowing users to add comments and for providing interaction via forms. It is less suited as an intermediate editable format which is why there are lots of creation, display, splitting tools but only a few basic editing tools.

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>