Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

PDF to HTML5 conversion – non-standard glyfs

48 sec read

I have been looking at a PDF to HTML5 conversion issue where there was some odd text appearing on the HTML5 page but not in the PDF file. It turned out to be rather interesting…

Every glyf inside a PDF file has a name (A, B, Space, ellipsis, etc). There are a whole set of standard values defined but you can also use any arbitary value. They are listed in the charset and inside the fonts. So long as the values match where they are used up you can call them what you want.

However, if you create your own glyfs, the software may not be able to resolve the actual character you want to associate with this to display or extract as text. So what should we do when generating HTML5 from these files? The only value we have is the glyf name so this is the odd text we were seeing on the screen (in this case angbracketleft and angbracketright).

So we have added some mapping code into the static helper class HTMLHelper so you can replace these with an appropriate value

/**
* replace any non-standard glyfs
*/public String mapNonstandardGlyfName(String glyf,PdfFont currentFontData) {

glyf = glyf.replaceAll("angbracketright", ")");
glyf = glyf.replaceAll("angbracketleft", "(");

return glyf;
}

That looks rather better!

fixed text

 

 

This post is part of our “Fonts Articles Index” in these articles we explore Fonts.



Converting PDF/ Office Documents to HTML?

Convert PDF to HTML Find out why our customers use BuildVu for HTML conversion

Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2021. All rights reserved.