I have been looking at a PDF to HTML5 conversion issue where there was some odd text appearing on the HTML5 page but not in the PDF file. It turned out to be rather interesting…
Every glyf inside a PDF file has a name (A, B, Space, ellipsis, etc). There are a whole set of standard values defined but you can also use any arbitary value. They are listed in the charset and inside the fonts. So long as the values match where they are used up you can call them what you want.
However, if you create your own glyfs, the software may not be able to resolve the actual character you want to associate with this to display or extract as text. So what should we do when generating HTML5 from these files? The only value we have is the glyf name so this is the odd text we were seeing on the screen (in this case angbracketleft and angbracketright).
So we have added some mapping code into the static helper class HTMLHelper so you can replace these with an appropriate value
/** * replace any non-standard glyfs */public String mapNonstandardGlyfName(String glyf,PdfFont currentFontData) { glyf = glyf.replaceAll("angbracketright", ")"); glyf = glyf.replaceAll("angbracketleft", "("); return glyf; }
That looks rather better!
This post is part of our “Fonts Articles Index” in these articles we explore Fonts.