PDF to HTML5 conversion – duplicate font names

When you convert PDF to HTML5, you can have a potential problem of duplicate font names. In a PDF file, you can embed lots of fonts and subset them to ignore just the glyfs you are using (keeping the font size down). So a page could contain several fonts, all called Arial. This is not an issue in a PDF file because the font name is a piece of information not the key used to identify the font.

In a PDF file, it is the unique key which identifies the fonts used in the CSS tag (FONT-FAMILY) and the @font-face tag to embed the font. So we need to ensure that the font name is unique in the HTML5. How will you handle this?

This is how we deal with this. The first time you use Arial, we will call it Arial. If a different version of Arial appears we will append the FontID  (which is how the PDF identifies it) and the size of the font data to give a unique version (Arial_C2_0_5400). Luckily, because the PDF does not use it, we can easily alter it for our own use without breaking anything else and handle all these fonts. Does this seem sensible?

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>