There are several ways that you can deal with text and fonts in PDF files when converting to HTML5. Here are there are the top 3 ways and how they stack up against each other:
- Convert PDF fonts to web fonts and draw real, selectable text
- Convert PDF fonts to shapes and draw text as shapes (with no text selection)
- Convert PDF fonts to shapes and draw text as shapes, and also draw invisible, real text on top to allow text selection.
1. Convert PDF fonts to web fonts and draw real, selectable text:
If you require text to be selectable, there are 2 ways to achieve this. The first is to convert PDF fonts into web browser compatible fonts and to draw HTML text with the font applied. However this is not a trivial process – the PDF file format has not designed the font handling to make contained fonts compatible with web browsers, and there are many caveats that make accurately converting fonts a nightmare. This is the reason that it is very rare to see a PDF to HTML conversion tool that can retain fonts.
Additionally, the PDF file format allows very fine control over text sizing, positioning and kerning in a very concise way. HTML was not designed to handle such control which can make converting to real text quite hazardous – the more accuracy that is retained, the larger the file size of the converted HTML (sometimes unrealistically so).
The solution is to compromise on the accuracy retained, averaging spacing over an entire line where possible rather than using kerning between individual characters. An example of this type of conversion can be seen below.
2. Convert PDF fonts to shapes and draw text as shapes:
If your only requirement is a perfect visual match, the best option is to convert fonts in PDF files into shapes, and output either on image or as SVG. The benefit here is that you get a perfect visual match, however, the file produced does not actually contain any text, which is bad for SEO and also means that it’s not possible to select text and copy/paste text out.
Here is an example of a PDF with text converted to shapes in this way:
3. Convert PDF fonts to shapes and draw text as shapes, but also draw invisible real text on top to allow text selection:
If you require a perfect match and text selection, this can be achieved by writing out text as shapes and putting an invisible layer of text on top that can be used for selection. This means that visibly the file will look perfect, and any slight inaccuracies in fonts or real text positioning will not be seen.
There are multiple ways to implement this functionality, for example, some tools have built their own JavaScript selection engine because it’s easier than putting real text there, other tools use real text that is transformed to the correct size, though fonts are not converted.
Here’s an example where the real text is drawn along with converted fonts, but drawn invisible:
So, which is best?
In our opinion option, 1 is best, though it is certainly the most difficult which is why it is so rare to see. This is the mode that we like to show off when demoing BuildVu. If you want to find out more, you can try our FREE PDF to HTML5 converter online, or find out more information and download the trial edition.
Are you a Developer working with PDF files?
Our developers guide contains a large number of technical posts to help you understand the PDF file Format.
Do you need to solve any of these problems?
Display PDF documents in a Web app |
Use PDF Forms in a web browser |
Convert PDF Documents to an image |
Work with PDF Documents in Java |