There are several ways that you can deal with text and fonts in PDF files when converting to HTML5. Here are there are the top 3 ways and how they stack up against each other:
- Convert PDF fonts to web fonts and draw real, selectable text
- Convert PDF fonts to shapes and draw text as shapes (with no text selection)
- Convert PDF fonts to shapes and draw text as shapes, and also draw invisible, real text on top to allow text selection.
1. Convert PDF fonts to web fonts and draw real, selectable text:
If you require text to be selectable, there are 2 ways to achieve this. The first is to convert PDF fonts into web browser compatible fonts, and to draw HTML text with the font applied. However this is not a trivial process – the PDF file format has not designed the font handling to make contained fonts compatible with web browsers, and there are many caveats that make accurately converting fonts a nightmare. This is the reason that it is very rare to see a PDF to HTML conversion tool that can retain fonts.
Additionally, the PDF file format allows very fine control over text sizing, positioning and kerning in a very concise way. HTML was not designed to handle such control which can make converting to real text quite hazardous – the more accuracy that is retained, the larger the file size of the converted HTML (sometimes unrealistically so).
The solution is to compromise on the accuracy retained, averaging spacing over an entire line where possible rather than using kerning between individual characters. An example of this type of conversion can be seen below.
2. Convert PDF fonts to shapes and draw text as shapes:
If your only requirement is a perfect visual match, the best option is to convert fonts in PDF files into shapes, and output either on image, or as SVG. The benefit here is that you get a perfect visual match, however the file produced does not actually contain any text, which is bad for SEO and also means that it’s not possible to select text and copy/paste text out.
Here is an example of a PDF with text converted to shapes in this way:
3. Convert PDF fonts to shapes and draw text as shapes, but also draw invisible real text on top to allow text selection:
If you require a perfect match and text selection, this can be achieved by writing out text as shapes and putting an invisible layer of text on top that can be used for selection. This means that visibly the file will look perfect, and any slight inaccuracies in fonts or real text positioning will not be seen.
Here’s an example where real text is drawn along with converted fonts, but drawn invisible:
So, which is best?
In our opinion option 1 is best, though it is certainly the most difficult which is why it is so rare to see. This is the mode that we like to show off when demoing our PDF to HTML5 Converter. If you want to find out more, you can try our PDF to HTML5 converter online for free, or find our more information and download the trial edition.
IDRsolutions develop a Java PDF library, a PDF forms to HTML5 converter, a PDF to HTML5 or SVG converter and a Java Image Library that doubles as an ImageIO replacement. On the blog our team post about anything interesting they learn about.