Mark Stephens Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

A timely lesson on why it is a bad idea to edit a PDF file directly

1 min read

acrobat-10-iconAt IDR Solutions whilst working on some files we found we had a very good example last week to show why it is a bad idea to edit a PDF file directly. Let me share the story…

One of our customers wanted to remove some Annotations from a PDF file. So they deleted the /Annots object from the Page object. They then wondered why the file was so much slower to load and render.

In theory this just looks like a minor edit to a file. But a PDF file is not an ordinary file. It is a data dump, with a look-up table at the end. The lookup table allows the PDF viewer to read just the look-up table and then skip to just the objects it needs using Random Access. This is one reason why opening a PDF and moving around is very fast.

However, if you edit the file so that one of the objects is now shorter, then all the objects which follow it will be in a different place from that specified in the look-up table. Most PDF tools will spot this. They will then manually load the entire file, and manually work-out what the correct look-up table positions should be, if this is possible. Sometimes, the act of manually editing the PDF file will make it totally unusable. This is a much slower process.

So if you need to edit a PDF file, please use a proper tool (like IText) which will allow you to delete objects and then properly update all the look-up tables in the PDF file. It will make your life much easier…

If you’re a first-time reader, or simply want to be notified when we post new articles and updates, you can keep up to date by social media (TwitterFacebook and Google+) or the  Blog RSS.

Mark Stephens Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *