Mapping Glyfs from PDF in HTML5 and SVG

When you have a PDF file you have an Encoding value which defines the exact glyf used. There are some standard settings (MAC, WIN, STD) or you can also build your own Encoding table. There is a standard set of glyf names (A, B, fl, fi, quote) but you can call your glyfs anything you like. It is essentially just used as a unique ID to map the values internally. If you do not use standard values, you might get garbage when the text is extracted but it will be perfect for viewing which is what most users look at.

If you convert a PDF file to HTML5 or SVG things become more complex. If you are mapping the glyfs onto actual text values you need to take more care. Firstly, some browsers will reject certain ranges of characters so they need to be remapped onto sensible values. It also starts to matter if you have used arbitary values.

Here is the data from a PDF I have been looking at. It actually includes some custom small caps characters so it has created some bespoke glyf names (a.sc for SMALL CAPS A, and so on).

So to fix this we would either write out text values 33-50 to map onto the embedded font or move them. Because it is only a limited set of values we could actually map it onto a or A and resturcture the fonts. It would probably need a larger sample size to decide the best approach. Or we could convert the text to shapes.

But it is a good example about how PDF to HTML5 and SVG conversion is not always a straight-forward process…

This post is part of our “Fonts Articles Index” in these articles we explore Fonts.

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>