Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Mapping Glyfs from PDF in HTML5 and SVG

1 min read

When you have a PDF file you have an Encoding value which defines the exact glyf used. There are some standard settings (MAC, WIN, STD) or you can also build your own Encoding table. There is a standard set of glyf names (A, B, fl, fi, quote) but you can call your glyfs anything you like. It is essentially just used as a unique ID to map the values internally. If you do not use standard values, you might get garbage when the text is extracted but it will be perfect for viewing which is what most users look at.

If you convert a PDF file to HTML5 or SVG things become more complex. If you are mapping the glyfs onto actual text values you need to take more care. Firstly, some browsers will reject certain ranges of characters so they need to be remapped onto sensible values. It also starts to matter if you have used arbitary values.

Here is the data from a PDF I have been looking at. It actually includes some custom small caps characters so it has created some bespoke glyf names (a.sc for SMALL CAPS A, and so on).

So to fix this we would either write out text values 33-50 to map onto the embedded font or move them. Because it is only a limited set of values we could actually map it onto a or A and resturcture the fonts. It would probably need a larger sample size to decide the best approach. Or we could convert the text to shapes.

But it is a good example about how PDF to HTML5 and SVG conversion is not always a straight-forward process…

This post is part of our “Fonts Articles Index” in these articles we explore Fonts.

Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Converting your PDF files to SVG with PDF2SVG

Last month we announced an updated product range for 2018. One of the changes is we have rebranded JPDF2HTML5 to BuildVu. This is because the...
Leon Atherton
1 min read

Converting your PDF files to HTML5 with BuildVu 

Recently we announced our updated product range for 2018 and are rebranding some existing products, like JPDF2HTML5 which has been renamed to BuildVu. It...
Georgia Ingham
3 min read

Favourite resources from our HTML development team

As the web progresses and grows, so do the technologies that come along with it. Trying to keep on top of everything you need...
Ovidijus Okinskas
1 min read

Leave a Reply

Your email address will not be published. Required fields are marked *