One of the challenges when using a converted font from a PDF file for HTML5 display is to ensure that the correct glyf value is used. Every font contains a map (CMAP) telling the font which glyf is the correct one to display for that actual character. Here is an example from a font inside PDF file. In this screenshot you can see that the character called lparenori is glyph number 4 and is assigned to character 52.
So when we work out the HTML we need to write this value out as char with value 52 and it will appear correctly in the font. So where does this value come from?
We cannot just use the value in the PDF file. In the actual PDF, the character is actually value 03 on the line [(<0F,03>)85(\n)]TJ
We know that 03 is the glyf called lparenori from the Differences look-up table in the PDF font object
So all we have to do is:-
1.read the value from the PDF table,
2. use the font encoding to find the glyf name
3. then use the data inside the actual embedded font to workout that lparenori is character 52
Note also that you can have glyf names which are not standard Adobe names, but it makes life simpler if you can still to the standard list defined by Adobe.
IDRsolutions develop a Java PDF library, a PDF forms to HTML5 converter, a PDF to HTML5 or SVG converter and a Java Image Library that doubles as an ImageIO replacement. On the blog our team post about anything interesting they learn about.