PDF hacks and HTML5 – ‘hidden’ PDF text

While debugging our PDF to HTML5 we have come across alsorts of interesting ‘PDF’ features which need conversion to an HTML5 equivalent.

Today, I have been looking at a PDF page which had extra text on the HTML5 version. It turns out that the text is also on the PDF but it is just invisible. You can select it but you cannot see it. In the PDF a white box has been drawn over it…

In general this is not a good way to delete PDF text (especially if it is sensitive or confidential!). The text is still there in the PDF and can be easily extracted.

The white box is also drawn in the HTML5 but because the shape is on the canvas layer (and the text is in a div on the separate text layer) the text is not hidden.

The practical fix is to put the text onto the canvas and we have a flag to do this. This is not totally satisfactory because text on the canvas acts like a bitmap. It does not scale without pixellation.

As is often the case, the quality of the PDF effects what we can do in HTML5.

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>