In a previous article, I wrote about the problems with editing the text in PDF files. PDF files are very different from other file formats such as Word or OpenOffice which stored the data as a set of objects which are then rendered as needed. The PDF file format were really designed for end file display.
A PDF file is more like a vector image file. It contains a set of pages which draw the page so it looks perfect – underneath there are very few structures so editing can be a nightmare. Essentially what you now have in a PDF is the draw commands in Postscript to show the content, not the content itself.
Where it becomes difficult if you want to change the actual content on the page. Because the structure of words, paragraphs and text flow no longer exists it is very difficult to alter the text, especially if you need to reflow it. You are having to have to hack the Postscript command stream and guess what is going on. PDF files which look identical can be structured very differently internally.
The PDF file format is great for displaying content, securing it, allowing users to add comments and for providing interaction via forms. It is less suited as an intermediate editable format which is why there are lots of creation, display, splitting tools but only a few basic editing tools.
Latest posts by Mark Stephens (see all)
- How we are improving our code quality with IDEA in 2018 - March 7, 2018
- How we are improving our code quality with NetBeans in 2018 - March 1, 2018
- 3 ways that the European Union is changing the way Companies write software in 2018 - January 31, 2018
- IDRsolutions product range update for 2018 - January 22, 2018
- 4 ways Companies can make remote working successful - December 21, 2017