Leon Atherton

Leon is a developer at IDRsolutions and product manager for BuildVu. He is responsible for managing the BuildVu product strategy and roadmap, and also spends a lot of his time writing code to build new features, improve functionality, fix bugs, and improve the testing for BuildVu.

Three ways to convert PDF to HTML5: Text and Fonts

2 min read

There are several ways that you can deal with text and fonts in PDF files when converting to HTML5. Here are there are the top 3 ways and how they stack up against each other:

  1. Convert PDF fonts to web fonts and draw real, selectable text
  2. Convert PDF fonts to shapes and draw text as shapes (with no text selection)
  3. Convert PDF fonts to shapes and draw text as shapes, and also draw invisible, real text on top to allow text selection.

 

1. Convert PDF fonts to web fonts and draw real, selectable text:

If you require text to be selectable, there are 2 ways to achieve this. The first is to convert PDF fonts into web browser compatible fonts, and to draw HTML text with the font applied. However this is not a trivial process – the PDF file format has not designed the font handling to make contained fonts compatible with web browsers, and there are many caveats that make accurately converting fonts a nightmare. This is the reason that it is very rare to see a PDF to HTML conversion tool that can retain fonts.

Additionally, the PDF file format allows very fine control over text sizing, positioning and kerning in a very concise way. HTML was not designed to handle such control which can make converting to real text quite hazardous – the more accuracy that is retained, the larger the file size of the converted HTML (sometimes unrealistically so).

The solution is to compromise on the accuracy retained, averaging spacing over an entire line where possible rather than using kerning between individual characters. An example of this type of conversion can be seen below.

2. Convert PDF fonts to shapes and draw text as shapes:

If your only requirement is a perfect visual match, the best option is to convert fonts in PDF files into shapes, and output either on image, or as SVG. The benefit here is that you get a perfect visual match, however the file produced does not actually contain any text, which is bad for SEO and also means that it’s not possible to select text and copy/paste text out.

Here is an example of a PDF with text converted to shapes in this way:

3. Convert PDF fonts to shapes and draw text as shapes, but also draw invisible real text on top to allow text selection:

If you require a perfect match and text selection, this can be achieved by writing out text as shapes and putting an invisible layer of text on top that can be used for selection. This means that visibly the file will look perfect, and any slight inaccuracies in fonts or real text positioning will not be seen.

There are multiple ways to implement this functionality, for example some tools have built their own JavaScript selection engine because it’s easier than putting real text there, other tools use real text that is transformed to the correct size, though fonts are not converted.

Here’s an example where real text is drawn along with converted fonts, but drawn invisible:

So, which is best?

In our opinion option 1 is best, though it is certainly the most difficult which is why it is so rare to see. This is the mode that we like to show off when demoing our PDF to HTML5 Converter. If you want to find out more, you can try our PDF to HTML5 converter online for free, or find our more information and download the trial edition.

If you’re a first-time reader, or simply want to be notified when we post new articles and updates, you can keep up to date by social media (TwitterFacebook and Google+) or the Blog RSS.

Leon Atherton

Leon is a developer at IDRsolutions and product manager for BuildVu. He is responsible for managing the BuildVu product strategy and roadmap, and also spends a lot of his time writing code to build new features, improve functionality, fix bugs, and improve the testing for BuildVu.

Converting your PDF files to HTML5 with BuildVu 

Recently we announced our updated product range for 2018 and are rebranding some existing products, like JPDF2HTML5 which has been renamed to BuildVu. It...
Georgia Ingham
3 min read

Favourite resources from our HTML development team

As the web progresses and grows, so do the technologies that come along with it. Trying to keep on top of everything you need...
Ovidijus Okinskas
1 min read

How HTML5 Javadocs in Java 9 will make your…

Here at IDRsolutions we are very excited about Java 9 and have written a series of articles explaining some of the main features. In...
Rob
1 min read

Leave a Reply

Your email address will not be published. Required fields are marked *