Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Glyph names- what is in a name?

1 min read

I came across a rather intriguing problem while debugging a font issue in a PDF created by a tool called A-PDF.
The PDF specification has a great deal of flexibility. And one of the many tricks you can do is to redefine how glyph values mapped onto indexes. You can create a custom set of encodings to map values onto glyphs. This is really useful if you want to embed a font with just a few characters to save space.  This is done with a differences object and would look something like this if viewed in the raw file
427 0 obj
/Differences [ 2 /A/B/euro ]
So value 2 maps onto glyph ‘A’, value 3 maps onto ‘B’ and value 4 maps onto the euro character.
There is a list of all the standard glyph values, and the standard mappings used if you do not create your own. Appendix D of the PDF Reference lists the standard encodings and all the glyph names. Most of the time, you do not need to define your own values and can just use the already prepared tables – StandardEncoding, MacRomanEncoding, WinAnsiEncoding and so forth.
Where it gets slightly messy is that not only can you define your own Encoding but you can create your own glyphs. The glyph name is just a key value used to lookup font data in other tables. So long as you are consistent, any value should be possible. So you could have a Differences object along the lines of
427 0 obj
/Differences [ 2 /AnyName/SillyName/MyNewGlpyh ]
Most of the time, this works fine, but what about this value, taken from the problem file.
427 0 obj
/Differences [ 2 /#23#234CH2eb0c8ba15de4cce8fa3c169622f8e93 /#23#2346H7a539460a8268e5915c0973dbb05dce1
Usually, the # character indictates the next 2 characters are a numeric value we use – in theory, so long as we are consistent, it should not matter. But this is what it looks like in Acrobat.
So for the values 1 and 2, we need to strip out the number values after the # so that  #23#234CH2eb0c8ba15de4cce8fa3c169622f8e93  becomes ##4CH2eb0c8ba15de4cce8fa3c169622f8e93
But for value 3, we strip the first 2. Why do we need to strip some of the values?
I can only guess that the presence of non-numeric values make the numbers invalid or that we strip the first 2 values, but I can’t find any clear rules – I am just guessing. So if you see the value # is a PDF string, be careful…
And if you the exact rules which should be applied here why not post and explain what’s really going on here…
This post is part of our “Understanding the PDF File Format” series. In each article, we aim to take a specific PDF feature and explain it in simple terms. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

Did you know...

IDRsolutions offers a whole range of online file converters to convert PDF and Microsoft Excel, Word and Office Documents to HTML5, SVG or image formats?

It is free to use for single file conversions and also includes Developer links if you want to use our commercial software for bulk conversions. Find out more on this page

Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Enabling SVG Gzip Compression on Apache and NGINX

Gzip compression is a widely supported method of reducing the size of the content sent from a web server in order to improve the...
Leon Atherton
47 sec read

One Reply to “Glyph names- what is in a name?”

  1. Mark,

    I think that #23 is supposed to represent the Ascii character 0x23, which is “#”. It’s just confusing because the escape character is also #.

    If you preprocess the “#23#23CH2eb0” replacing every #nn with its ASCII equivalent, you’d get “##CH2eb0”. But somebody stuck the un-preprocessed string in the Differences dictionary.

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2020. All rights reserved.