Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

A timely lesson on why it is a bad idea to edit a PDF file directly

58 sec read

acrobat-10-iconAt IDR Solutions whilst working on some files we found we had a very good example last week to show why it is a bad idea to edit a PDF file directly. Let me share the story…

One of our customers wanted to remove some Annotations from a PDF file. So they deleted the /Annots object from the Page object. They then wondered why the file was so much slower to load and render.

In theory this just looks like a minor edit to a file. But a PDF file is not an ordinary file. It is a data dump, with a look-up table at the end. The lookup table allows the PDF viewer to read just the look-up table and then skip to just the objects it needs using Random Access. This is one reason why opening a PDF and moving around is very fast.

However, if you edit the file so that one of the objects is now shorter, then all the objects which follow it will be in a different place from that specified in the look-up table. Most PDF tools will spot this. They will then manually load the entire file, and manually work-out what the correct look-up table positions should be, if this is possible. Sometimes, the act of manually editing the PDF file will make it totally unusable. This is a much slower process.

So if you need to edit a PDF file, please use a proper tool (like IText) which will allow you to delete objects and then properly update all the look-up tables in the PDF file. It will make your life much easier…



Our software libraries allow you to

Convert PDF to HTML in Java
Convert PDF Forms to HTML5 in Java
Convert PDF Documents to an image in Java
Work with PDF Documents in Java
Read and Write AVIF, HEIC, WEBP and other image formats
Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.