Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Problems editing PDF files

1 min read

In a previous article, I wrote about the problems with editing the text in PDF files. PDF files are very different from other file formats such as Word or OpenOffice which stored the data as a set of objects which are then rendered as needed. The PDF file format were really designed for end file display.

A PDF file is more like a vector image file. It contains a set of pages which draw the page so it looks perfect – underneath there are very few structures so editing can be a nightmare. Essentially what you now have in a PDF is the draw commands in Postscript to show the content, not the content itself.

Some manipulations are very easy with a PDF. So splitting a PDF into separate pages or drawing on top of a PDF is very easy. PDF forms and Javascript content are also easy to alter as they have a clear structure. Its also straight-forward to change one image for another of the same size.

Where it becomes difficult if you want to change the actual content on the page. Because the structure of words, paragraphs and text flow no longer exists it is very difficult to alter the text, especially if you need to reflow it. You are having to have to hack the Postscript command stream and guess what is going on. PDF files which look identical can be structured very differently internally.

The PDF file format is great for displaying content, securing it, allowing users to add comments and for providing interaction via forms. It is less suited as an intermediate editable format which is why there are lots of creation, display, splitting tools but only a few basic editing tools.

IDRsolutions develop a Java PDF Viewer and SDK, an Adobe forms to HTML5 forms converter, a PDF to HTML5 converter and a Java ImageIO replacement. On the blog our team post anything interesting they learn about.

Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2019. All rights reserved.