In a previous article, I wrote about the problems with editing the text in PDF files. PDF files are very different from other file formats such as Word or OpenOffice which stored the data as a set of objects which are then rendered as needed. The PDF file format were really designed for end file display.
A PDF file is more like a vector image file. It contains a set of pages which draw the page so it looks perfect – underneath there are very few structures so editing can be a nightmare. Essentially what you now have in a PDF is the draw commands in Postscript to show the content, not the content itself.
Where it becomes difficult if you want to change the actual content on the page. Because the structure of words, paragraphs and text flow no longer exists it is very difficult to alter the text, especially if you need to reflow it. You are having to have to hack the Postscript command stream and guess what is going on. PDF files which look identical can be structured very differently internally.
The PDF file format is great for displaying content, securing it, allowing users to add comments and for providing interaction via forms. It is less suited as an intermediate editable format which is why there are lots of creation, display, splitting tools but only a few basic editing tools.
Latest posts by Mark Stephens (see all)
- Is attending JavaOne still worth the hassle? - June 21, 2017
- Is it still worth attending software conferences in 2017? - June 1, 2017
- What should be in Java10? - May 31, 2017
- What are NetBeans Days? - April 4, 2017
- NetBeans Day UK announced for Tuesday 25th April 2017 - February 1, 2017