Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Mapping Glyfs from PDF in HTML5 and SVG

1 min read

When you have a PDF file you have an Encoding value which defines the exact glyf used. There are some standard settings (MAC, WIN, STD) or you can also build your own Encoding table. There is a standard set of glyf names (A, B, fl, fi, quote) but you can call your glyfs anything you like. It is essentially just used as a unique ID to map the values internally. If you do not use standard values, you might get garbage when the text is extracted but it will be perfect for viewing which is what most users look at.

If you convert a PDF file to HTML5 or SVG things become more complex. If you are mapping the glyfs onto actual text values you need to take more care. Firstly, some browsers will reject certain ranges of characters so they need to be remapped onto sensible values. It also starts to matter if you have used arbitary values.

Here is the data from a PDF I have been looking at. It actually includes some custom small caps characters so it has created some bespoke glyf names (a.sc for SMALL CAPS A, and so on).

So to fix this we would either write out text values 33-50 to map onto the embedded font or move them. Because it is only a limited set of values we could actually map it onto a or A and resturcture the fonts. It would probably need a larger sample size to decide the best approach. Or we could convert the text to shapes.

But it is a good example about how PDF to HTML5 and SVG conversion is not always a straight-forward process…

This post is part of our “Fonts Articles Index” in these articles we explore Fonts.

IDRsolutions develop a Java PDF Viewer and SDK, an Adobe forms to HTML5 forms converter, a PDF to HTML5 converter and a Java ImageIO replacement. On the blog our team post anything interesting they learn about.

Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Enabling SVG Gzip Compression on Apache and NGINX

Gzip compression is a widely supported method of reducing the size of the content sent from a web server in order to improve the...
Leon Atherton
47 sec read

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2019. All rights reserved.