Three ways to convert PDF to HTML5: Text and Fonts

There are several ways that you can deal with text and fonts in PDF files when converting to HTML5. Here are there are the top 3 ways and how they stack up against each other:

  1. Convert PDF fonts to web fonts and draw real, selectable text
  2. Convert PDF fonts to shapes and draw text as shapes (with no text selection)
  3. Convert PDF fonts to shapes and draw text as shapes, and also draw invisible, real text on top to allow text selection.

 

1. Convert PDF fonts to web fonts and draw real, selectable text:

If you require text to be selectable, there are 2 ways to achieve this. The first is to convert PDF fonts into web browser compatible fonts, and to draw HTML text with the font applied. However this is not a trivial process – the PDF file format has not designed the font handling to make contained fonts compatible with web browsers, and there are many caveats that make accurately converting fonts a nightmare. This is the reason that it is very rare to see a PDF to HTML conversion tool that can retain fonts.

Additionally, the PDF file format allows very fine control over text sizing, positioning and kerning in a very concise way. HTML was not designed to handle such control which can make converting to real text quite hazardous – the more accuracy that is retained, the larger the file size of the converted HTML (sometimes unrealistically so).

The solution is to compromise on the accuracy retained, averaging spacing over an entire line where possible rather than using kerning between individual characters. An example of this type of conversion can be seen below.

2. Convert PDF fonts to shapes and draw text as shapes:

If your only requirement is a perfect visual match, the best option is to convert fonts in PDF files into shapes, and output either on image, or as SVG. The benefit here is that you get a perfect visual match, however the file produced does not actually contain any text, which is bad for SEO and also means that it’s not possible to select text and copy/paste text out.

Here is an example of a PDF with text converted to shapes in this way:

3. Convert PDF fonts to shapes and draw text as shapes, but also draw invisible real text on top to allow text selection:

If you require a perfect match and text selection, this can be achieved by writing out text as shapes and putting an invisible layer of text on top that can be used for selection. This means that visibly the file will look perfect, and any slight inaccuracies in fonts or real text positioning will not be seen.

There are multiple ways to implement this functionality, for example some tools have built their own JavaScript selection engine because it’s easier than putting real text there, other tools use real text that is transformed to the correct size, though fonts are not converted.

Here’s an example where real text is drawn along with converted fonts, but drawn invisible:

So, which is best?

In our opinion option 1 is best, though it is certainly the most difficult which is why it is so rare to see. This is the mode that we like to show off when demoing our PDF to HTML5 Converter. If you want to find out more, you can try our PDF to HTML5 converter online for free, or find our more information and download the trial edition.

If you’re a first-time reader, or simply want to be notified when we post new articles and updates, you can keep up to date by social media (Twitter,FacebookandGoogle+) or theBlog RSS.

Related Posts:

The following two tabs change content below.
Leon is a developer at IDRsolutions who focuses primarily on JPDF2HTML5 core and IDRViewer development, as well as making sure that the monthly releases go as planned. He can also be found working on the marketing side, interpreting analytics and helping make sure that IDRsolutions continues to grow.
Leon Atherton

About Leon Atherton

Leon is a developer at IDRsolutions who focuses primarily on JPDF2HTML5 core and IDRViewer development, as well as making sure that the monthly releases go as planned. He can also be found working on the marketing side, interpreting analytics and helping make sure that IDRsolutions continues to grow.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>