PDF to HTML5 conversion – non-standard glyfs

I have been looking at a PDF to HTML5 conversion issue where there was some odd text appearing on the HTML5 page but not in the PDF file. It turned out to be rather interesting…

Every glyf inside a PDF file has a name (A, B, Space, ellipsis, etc). There are a whole set of standard values defined but you can also use any arbitary value. They are listed in the charset and inside the fonts. So long as the values match where they are used up you can call them what you want.

However, if you create your own glyfs, the software may not be able to resolve the actual character you want to associate with this to display or extract as text. So what should we do when generating HTML5 from these files? The only value we have is the glyf name so this is the odd text we were seeing on the screen (in this case angbracketleft and angbracketright).

So we have added some mapping code into the static helper class HTMLHelper so you can replace these with an appropriate value

/**
* replace any non-standard glyfs
*/public String mapNonstandardGlyfName(String glyf,PdfFont currentFontData) {

glyf = glyf.replaceAll("angbracketright", ")");
glyf = glyf.replaceAll("angbracketleft", "(");

return glyf;
}

That looks rather better!

fixed text

 

 

This post is part of our “Fonts Articles Index” in these articles we explore Fonts.

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>